Help Getting Started on NDS ROM Hacking: Creating a Table File From font.nftr

Discussion in 'NDS - ROM Hacking and Translations' started by Blackiris., Jun 29, 2015.

  1. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    Hello,

    I'm new to ROM hacking and I'm using FAST6191's extensive guide / documentary at the moment to get started. As a Computer Science student I'm familiar with some of the basics such as Hexadecimal, but I've no experience with ROM hacking itself.

    Right now I want to play around a bit with the Japanese NDS game Chocobo to Mahou no Ehon: Majo to Shoujo to 5-nin no Yuusha (roughly translated: Chocobo and the Magic Picture Book: The Witch and the Girl and The Five Heroes) which is basically the sequel of the localized Final Fantasy Fables: Chocobo Tales. As far as I know, no one has attempted to translate this game yet. That's a shame because it's really cute and fun and a pretty good game overall. Probably heavy on graphic editing, though.

    I'd like to translate this game someday, but right I need much more experience, so I won't attempt to do this seriously anytime soon, I guess.

    Aaanyway, what I've already done:
    - unpacked the .nds file using NDSTool
    - located the font.nftr file
    - opened the font.nftr file using NFTRedit

    What I want to do is creating a table file. In NFTRedit I can see which hexadecimal numbers are used for every character. I can also view the entire font as a character map (here's what I've gotten). Obviously I don't want to create the table file manually, especially because NFTRedit already seems to know all the necessary information (but cannot create a table file) and I assume the same font is used in other Square Enix games as well.

    Are there any tools that can generate a table file from the font.nftr file?

    The next problem that needs to be tackled (after I've found out the character encoding) is to find the files where the text is actually stored. This is the folder structure of the data folder:

    [​IMG]


    I've found a couple of files with "text" in their names or folder names, like data\book\text\booktext_ja.dat, but those are usually only 1KB in size. I've already located a huge bunch of .narc files, like data\lgame\lg001\lg001data_ja_.narc which might probably contain anything.
    I've no clue, though, where the actual text might be stored or if the files containing the text are compressed. Any general advice for that?

    Once I've got a table and a file containing text, I assume I need to use a hexadecimal tool like CrystalTile2 that (in the best case) supports tables, and search for text using relative search or other means. Is this correct?


    (I hope this is the right place for a thread like this, if not feel free to move it. I might also use this thread for further questions.)
     
  2. jjjewel

    jjjewel GBAtemp Maniac

    Member
    1,009
    293
    Dec 17, 2009
    United States
    The prequel of this game used UTF-8 encoding, so it's likely that this game also used the same encoding. (However, I haven't taken a look at this game, so I don't know for sure.) The scripts should be in "field" folder if it follows the old game's file system.

    BTW, if it used UTF-8 encoding, you don't need to create your own table.
     
    Last edited by jjjewel, Jun 29, 2015
  3. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    Thanks.

    I've just found this old thread here: https://gbatemp.net/threads/ds-3151...on-majo-to-shoujo-to-5nin-no-yuusha-j.121886/

    There psycoblaster says all text is encoded in UTF-16 LE, and is easily editable.
    I've manually created a table containing all numbers, letters, the most basic punctuation marks, hiragana and katakana now. In CrystalTile2 I'm able to identify some text parts in the binary file now. Thanks to psycoblaster I also know that much of the text is stored in small files in the data\mgentry\text folder.

    As you have said there's also a text folder in the field folder. I think it does contain some text, but I'm not so sure about this because I only find single words like "Anime" or "Cell" which are likely not part of the dialog.

    I still need to find a way to create a proper table with all the kanji etc., though. Does anyone, by any chance, have a table file from another Square Enix NDS game? I would assume that they will probably use much of the same structure.
     
  4. jjjewel

    jjjewel GBAtemp Maniac

    Member
    1,009
    293
    Dec 17, 2009
    United States
    If it used UTF-16 LE you can use the built-in "Unicode (Codepage 1200)" encoding in CrystalTile2. No need to create a table.

    P.S. The latest CrystalTile2 can handle compression with flag 11. (It wasn't able to do that back when psychoblaster posted in that thread, but most compression programs for DS can handle it now.)
     
  5. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    Thanks for your help, this seems to work indeed. Some strange symbols still appear throughout the text, but I guess those might be control codes or placeholders.

    Now the hexadecimal view looks like this:
    [​IMG]

    Now comes the difficult part, I assume.
     
    Pablitox likes this.
  6. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    A few more questions have popped up.

    I've realized now that psycoblaster was not entirely correct and the folder he mentioned only contains the explanation text for mini-games.

    1) In the folder field/text there are a couple of .pobj.z files. I can't seem to get much text out of them, but is it possibly that they are compressed? I've wondered about the .z file ending which seems to be a common type for NDS games, but so far I've found nothing on the internet.

    2) If I want to create a .nds file from the extracted files, does it suffice to use the same parameters? I've extracted the .nds file using nds.
    - Extraction: ndstool -x *.nds -9 arm9.bin -7 arm7.bin -y9 y9.bin -y7 y7.bin -d data -y overlay -t banner.bin -h header.bin
    - Creation: ndstool -c *.nds -9 arm9.bin -7 arm7.bin -y9 y9.bin -y7 y7.bin -d data -y overlay -t banner.bin -h header.bin

    That's how I've done it, but the created file magically appears to be a few MB smaller than the original file (131 MB vs. 121 MB)
     
    Pablitox likes this.
  7. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,705
    9,574
    Nov 21, 2005
    United Kingdom
    Yeah .z .lz .comp and other such extensions usually mean compression -- there were very few double barreled extensions on the DS that were not compression. Better yet open the file in a hex editor and see what it starts with -- 10 hex, 11 hex or 40 hex (probably not 40 as that was somewhat later) and followed by a length value, and then probably some kind of magic stamp as the file would normally start, and you are pretty sure to be looking at compression.

    The ndstool creation does not pad out the ROM again, and if there was any space between the files (there can be -- I think the US version of Tetris DS has a couple of megs after the sound file) then that will also not come back beyond what is needed for address alignment/boundary/sector type issues. Try trimming the ROM, or look at the internal length value in something like NDSTS ( http://www.no-intro.org/tools.htm ), and it will probably be very similar to the ndstool rebuild size.

    It sounds like you have the font stuff sorted. For any future forum searchers NFTRedit does not magically know the table/be able to create the table. It will give you a nice pictorial version of the character and its encoding value but you will have to fill in the box provided, or build the basic table up in something else and fill in any gaps. Its text testing thing might work out of the box but that is a best guess/common encoding -- there is no order in kanji and no hard and fast order in kana either (see something like Gojuon order) but many a game dev will just use the same order, and possibly even the same encoding, as shiftJIS, some kind of unicode, euc-jp, some other known encoding or something logical from the game (first character in the file and then second, most common, least common....). If you find yourself with a custom encoding there is a way to get the text test box to interpret your input as hex, I forget what it is offhand but I should have some pictures somewhere.
    If you are up for some fun then crystaltile2 has a very basic optical character recognition/OCR function but it is not a terribly useful one if you are more used to something like subrip.
     
    Pablitox likes this.
  8. jjjewel

    jjjewel GBAtemp Maniac

    Member
    1,009
    293
    Dec 17, 2009
    United States
    I don't think this game would use any custom tables (based on its prequel's file structure.) It might use different encodings, though. Try switching to UTF-8, Unicode, and Shift-JIS, and you might find readable text.

    For compressions, if you use CrystalTile2's filesystem view, (Tools-->NDS File System), if it's a common LZ compression, you'll see LZ icon in front of it which you can right click the file and extract it (while you're on filesystem view.) Tinke should be able to extract/compress the files too.
     
    Pablitox likes this.
  9. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    Thanks for the detailed explanation, you two. You're a great help! :) The .z files are indeed compressed files with flag 11 (hex). What puzzles me, though, is that I can still get no text out of them, even if uncompressed. That leads me to the conclusion that those are no text files, or use a different encoding. The latter seems unlikely, but the files are in the field/text folder, so I still wonder if there's any text in them.

    The compression method is D2KP, or is this not a compression method? Anyway, I haven't found much information on D2KP, only these two forum posts:

    http://www.romhacking.net/forum/index.php?topic=8407.10;wap2

    and

    http://forum.xentax.com/viewtopic.php?f=16&t=5004

    So these sound a bit contradictory, but I think it makes sense that these files are graphic files (D2 = 2-dimensional?), also because the file name correspond to the names of characters in the game. I still wonder why those are in the field/text folder...

    I've tried searching the entire project for .ptx or .txt files, as those will always contain text. Some .txt files can even be read with a normal text editor without UTF-16 encoding. But still: Those files contain either system text ("Please enter a name for the card deck.") or mini-game descriptions ("<Rules"> blabla Silver: 22 Points, Gold: 30 Points, Platin: 35 Points). I haven't found any files with actual dialogues.

    Then we have a folder called talk/chara with lots of files named kaiwa000.pobj.z etc. – kaiwa means conversation, so I found this suspicious. Those are not audio files (file size is 9-20 KB), but they have "D2KP" in their headers again, and even after I've unpacked them using Tinke, I can't seem to get any useful information out of them.

    I wonder: Are these graphic files again, or are they not extracted / uncompressed properly? (The uncompressed files are much bigger, though.) In any case the main text is probably compressed somewhere because I can't even locate it in the entire .nds file (but I can find the description text there). I still suspect that those D2KP files might contain some text, but right now I don't know how to find out whether they do or don't.

    And what are these .pobj files anyway? The more I look around the more I think that they must contain some sort of text because most of the "suspicious" files are .pobj files.
     
    Last edited by Blackiris., Jun 30, 2015
    Pablitox likes this.
  10. jjjewel

    jjjewel GBAtemp Maniac

    Member
    1,009
    293
    Dec 17, 2009
    United States
    In many cases, "text" in romhacking means texture rather than text.

    Edited: Nope, I'm wrong about this game. The files in text folder in this game are actually text, encoded with Unicode encoding. If I have to guess, ja is for Japanese text for adult (with more kanji) and jc is Japanese text for children (with more kana.) So anything in subfolders ja and jc are very likely text, too.

    D2KP could be PK2D--> Pack 2D (just my guess, though). It's a container for graphics. (You'll see headers RLCN, RGCN, RNAN, etc. in there. Those are common NDS graphics.) You can try to PM pleonex who made Tinke if he has time to program the unpacker. It doesn't look complicated.

    Text in the story is in field/seaside/opening.act and is encoded with UTF-8 encoding. (I'm curious so I finally got the game.) Most field folder's texts are the ones with .act, but not all the .act files have text.

    The files with KAPH header are possibly 3D graphics. Sub-files in them seem to follow pokemon game's format here; https://gbatemp.net/threads/pokemon-clear-crystal.313791/page-15#post-4043657
     
    Last edited by jjjewel, Jun 30, 2015 - Reason: Edited to add more findings. ^_^
    Pablitox likes this.
  11. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    Ah, so the game does feature different kinds of encoding! Thanks for looking into it yourself. :) And yes, "ja" ist "Japanese for Adults" and "jc" is "Japanese for Children", the game features two kinds of text modes, one with fewer kanji as you have suspected.

    I'll contact pleonex about the D2KP thingy when I need to access these graphics, but I want to see how far I get with the text first.

    So the text seems to be scattered in lots of files, many of them containing not only text. That's a bit tedious, but not a big problem, I guess.
    The next problem I'd like to tackle are things like the formatting of the text. The hex editor has problems reading the text and occasionally interprets things the wrong way.

    Example: In the file data\field\book1\book.act おちたオオカミ (ochita ookami) is displayed in the editor as おちたオ▯▯カミ (strangely enough if I copy the text from CT2 it is displayed correctly). That's because there are some "AA" bytes in between, I guess: E382AAE382AAE382 should read オオカ, but the editor interprets it not as E382 AA E382 AA E382, but as E382 AAE3 82AA E382. Obviously the AA byte is used for something in the game.

    What I find really strange now is that E382 is displayed as オ (o), but then it is displayed as カ (ka). How does the editor know this when this are the exact same two bytes?

    And can I tell my editor to somehow skip certain bytes when displaying text like the "AA" byte?
     
  12. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,705
    9,574
    Nov 21, 2005
    United Kingdom
    Hmm I do not think I have seen non educational Japanese games with two scripts aimed at different language levels before. Mind you I have not looked and most of the time on the DS they just use the oodles of storage, the touchscreen and the reasonable amount of memory to stick furigana everywhere instead.
     
  13. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    That would have been better, don't know why Square Enix decided to put in two different modes. Maybe because their engine didn't support furigana and they didn't want to reprogram it.

    Anyway, I've just edited my first text and it actually worked in the game. Not as difficult as I thought it would be, but the difficult part is still ahead of me, haha. ^^
     
  14. jjjewel

    jjjewel GBAtemp Maniac

    Member
    1,009
    293
    Dec 17, 2009
    United States
    CT2 isn't the best hexeditor to display UTF-8 encoding, but it's the only one I use so I don't know which one to recommend. :P You can try looking for some other hexeditor that might work better.

    Also, each Japanese character in UTF-8 encoding is 3 bytes. So the オ is actually E382AA and カ is E382AB, and so on.

    As for skipping some bytes, I don't know if any hexeditor can do that. You might need to program text extractor or find someone to do it for you.

    @FAST6191 I think many games that allow you to play via WIFI tend to have more than one encoding. One that uses in generic text and another for WIFI contents.
     
  15. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    Oh, didn't know that UTF-8 uses three bytes per character. That explains it, then. ^^

    Just for fun I've unpacked another game: Noora to Toki no Koubou (Nora and the Time Studio), a Atelier-like game from Atlus with music from Michiko Naruke (Wild ARMs series). This game looks really comfy to work with: All text is stored in a single folder (except for maybe system text), about 1650 files in total, ordered by dialogue. What else is great about it is that all text is UTF-16 encoded and at the end of the file (the first 70% of each file is filled with code, but I don't know the function of it yet). The control codes are easy to find and understand (%9 changes the text color to blue, \n does a line break, 0000 ends the current textbox etc.) and the text is not interrupted by further code.

    It seems more fun to play around with this right now, and it is also a game that I'd like to work on someday, but it probably contains much more text than the Chocobo game.

    The thing I need most right now is a comprehensive text editor that can open files as UTF-8 or UTF-16 encoded files and save them the same way. In hex editors, I can only edit bytes, but adding new bytes is not possible or rather inconvenient. I think the size of the dialoge is not so relevant in most DS game (it has to fit into the windows, though), so translating + saving should be possible without much additional work, conversion and so on.

    I've tried a few editors, but so far none were able to a) read the file correctly or b) save it correctly in the same format. Does anyone know a good tool for this?
     
    Last edited by Blackiris., Jul 1, 2015
  16. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,705
    9,574
    Nov 21, 2005
    United Kingdom
    There are some table aware hex editors used by hackers. Equally you could probably crowbar something together with atlas and cartographer (two programs that do a lot of high end text hacking). For the most part though people tend to make game/format specific text editors every time for even if you get the encoding stuff sorted you are still likely to have to mess with pointers and manually redoing pointers is crushingly boring.
     
  17. jjjewel

    jjjewel GBAtemp Maniac

    Member
    1,009
    293
    Dec 17, 2009
    United States
    Well, make sure you understand what text pointers are and check whether the game(s) used text pointers or not. From what I understand, most editors that would allow you to edit text directly don't take care of updating pointers, and the scripts will be screwed up when you put them back in the game.
     
  18. Normmatt

    Normmatt Former AKAIO Programmer

    Member
    2,142
    544
    Dec 14, 2004
    New Zealand
    .ptx format is pretty simple:

    struct Entry {
    u32 textLength;
    u8 utf16le_text[textLength];
    }

    struct PTX {
    u32 numEntries;
    Entry entries[numEntries];
    }
     
  19. Blackiris.
    OP

    Blackiris. Member

    Newcomer
    44
    90
    Nov 26, 2014
    Gambia, The
    Thanks, you three. I think I'll try to make my own editor once I've got a basic understanding about how things work. I've tried to use Cartographer to extract the text, but probably due to my inexperience it didn't work out.

    Regarding pointers: I know how they work generally, but I'm not sure how exactly they work in a ROM. Does it basically work like this: The program code (or a pointer table) refers to a certain address where the text is stored to get the data when required? And if this address is changed (due to adding or removing bytes), errors occur?
    Do pointers only refer to the start of the text part? And I assume they point to an address relative to the file in DS games where the text is stored in different files?

    The .ptx format looks pretty simple indeed. So it's basically a list of entries, and if I want to add an entry, I need to change the "numEntries" in PTX. So as long as there are no new entries, and the textLength stays the same, it should work out. Is the textLength fixed, though, or is textLength the maximum length?
     
  20. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,705
    9,574
    Nov 21, 2005
    United Kingdom
    You say computer science is your thing so I imagine you would have done the whole pointers bit as part of learning C, however I still find a lot of people come out a bit hazy. Fortunately ROM hacking pointers are directly related to it but usually quite simple. There are maybe four main classes

    1) Standard. There will be a table/list of values in there which points to the address within the file, or in the case of the GBA and most older consoles, as well as DS binaries and overlays, the memory as it is addressed by the system. I have also also see separate pointer files from the archive file before -- usually they are named very obviously though (some_archive.archi would probably have a far smaller file called some_archive.header or something along those lines), if you particularly wanted an example I think it was Inazuma 11.
    2) Offset. Like standard but rather than starting to count from the start of the file they then count from some offset within it, typically this will be an obvious point like the actual start of the text section/section in question.
    3) Relative. Count from the location you are at now (so if you are at 80h in the file and the pointer value says 10h you go to 90h), I suppose you could have relative combined with offset but that would be silly and I have not seen it. I might expect it on the PC at some point though if things get fired into memory.
    4) Size values. Usually seen more in simple archive formats but here it might just be a list of file sizes, add it to the one before (possibly also account for boundaries but hopefully not) and you have the next location, add the sum of that to the next one and you have the third part...

    In text things I do occasionally see other data/formatting/character attribution (to change the picture/name in the box in the top left for instance) within the pointers or even in the pointer itself -- the NARC archive format quite notably used the first bit (of 32 and DS files can usually manage with 31 bits of address-- 2 gigabytes tends to be enough) of the pointer to indicate that the file was in a subdirectory.
    I did also see a game (edit it was Touch Detective) that used shifted values once. Here you would have to do a logical shift by 2 (or just multiply by 4) to get the value.
    Things can also be merged, for instance there might be a standard pointer field but also have size in there as well, names and sizes/pointers are also commonly seen together.

    If I have to describe it using more conventional terms then the idea of a contents page in a book is pretty similar.

    Atlas and Cartographer are pretty hard to get going at first. The pointer stuff just mentioned can lead to all sorts of weird and wonderful formats and they try to be able to handle it; this extensibility comes at a cost of ease of use though. There are two alternatives in Kruptar7 and Oriton ( http://www.magicteam.net/index.php?page=programs site is Russian but the programs have English, also see http://romhack.github.io/doc/kruptarPlugins/ and have some source if you want https://bitbucket.org/magicteam_net/kruptar ) that are slightly easier to use but perhaps not as good at the very high end.
    Technically crystaltile2 also features text dumping abilities (the original author made a program called crystalscript before CT2 took off) with some reasonable search options but I am not a great fan of its.

    Also while I am linking things I might as well also link http://transcorp.romhacking.net/scratchpad/Table File Format.txt