How to create tables for Japanese ds roms

Discussion in 'NDS - ROM Hacking and Translations' started by Bagira20, Feb 23, 2017.

  1. Bagira20
    OP

    Bagira20 Member

    Newcomer
    19
    0
    Feb 23, 2017
    Gambia, The
    Hello everyone.

    I'm pretty new to all of this and I am having real trouble grasping the concept of Japanese tables. I get the concept of tables in general, but I don't understand how you manage that for Japanese.

    So, if you want to create a table for an English ROM, you search or relative search for the letters a bit and step by step fill up the table, right? But Japanese has over 10.000 or something like that Kanji with no real order. How are you supposed to create a table for that? Do you seriously have to find out every single Kanji? Or is there a trick to it?

    I hope someone here can help me, because I'd really like to translate a game but have been stuck on this step for
    some time now.

    Thanks in advance.
     


  2. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    "But Japanese has over 10.000 or something like that Kanji with no real order."

    As much as I know . is used as a number separator in many European languages that still amuses me every time.

    But yes it might well boil down to hand encoding every character.

    Some things to try before that

    The same devs would also face the problem of however many thousand characters they need and would also not want to hand type everything.
    To that end
    http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml and http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml and other common standards might have their order or even outright encoding borrowed -- plenty of Japanese games have been seen to have the shiftJIS setup from 8000 onwards in there but omit the first part).
    Most common/least common in the text has been seen.
    Order used in the text has been seen. The reverse could also be possible.
    Same order as in the font has been seen many times.
    Do also be aware that the above three may have been done for an earlier version of the script while it was still in development and thus be different from the final. Does not happen often but can happen.

    In the case of fonts then the nftr font format used in many DS games does not have the encoding in it but it will have the character able to be displayed along with the value that calls it. NFTReditor will be able to help you here.
    Japanese is taught in various levels, get a popular way it is taught in schools or something (joyo kanji being a good start) and it might match that. If you have a Japanese speaker/native/one that might recognise this sort of thing then yeah.
    The developer may well have previous format it used. I normally see the English one but Capcom have a nice table that matches the lower case Roman characters with the upper case ASCII, but it otherwise very custom.
    I have never seen anybody successfully use it in anger but there are some tools that attempt to OCR (texterkennung if my dictionary has not failed me) things and are aimed at hacking. Crystaltile2 is the most well developed but it is not good, not to mention is seems to be based around Chinese and that does not play so well for Japanese.

    Do also remember you have the game in front of you and if it is for a nice system you can easily run modded games for then you can play with it. For instance if you found where the text is in the game (corruption, names, assembly tracing... does not matter really) you can then put any value you like in the text section, or any run of characters or something else that might reveal something far quicker than a weekend of static analysis.

    Beyond that even much like I have not typed the character z before now then not every script will use every character -- you might only need enough to decode a section, or the whole script, or enough of it, for your translation to be getting on with.
     
  3. Bagira20
    OP

    Bagira20 Member

    Newcomer
    19
    0
    Feb 23, 2017
    Gambia, The
    ok, I thing I get most of what you said, but I have some questions:

    1. How do I find the NFTR file?
    2. I don't really understand how to read the shiftJIS table you linked me. Can you explain how I have to read it?
    3. If you think that CrystalTile isn't the best option, do you have an alternative?

    Sorry for being so incompetent and thanks for your patience
     
  4. natanelho

    natanelho GBAtemp Maniac

    Member
    1,322
    345
    Apr 25, 2015
    Antarctica
    Between the Sacred Silence and Sleep
    I have a question- if im gonna use only some of the kanji, and I dont really mind all the involved work, i could also just write the text and hive each new character its number, right? Like, move from latter to latter and write a number under it, and if its not already in the table, give it a new number and add to the table- this process can be easily automated, doesnt it? I dont know jap well, i just started learning, but Im pretty sure I could just write a program that will do just that...
     
  5. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    Quite often we have to explain the concept of tables, you come to us knowing that already so that is all good.

    1) NFTR is just a font format. There are many other but it is the one Nintendo provided for the DS and it is pretty good at what it does so a lot of games use it. Usually it will have the extension nftr or nft, might be in a folder called font or fnt. There are some exceptions and some games will do other things but go with the previous stuff for now.
    2) It is sort of like https://docs.tibco.com/pub/managed-...A8221-930C-4C22-BB35-1A2D6C961BAF-display.jpg or http://fuel-efficient-vehicles.org/pwsdb/images/pages/ascii-character-codes-IBM-PC-DOS-s.jpg
    The value on the side is the start value of the line. You then count up (in hex) for every character. Quite confusing and I am not sure why certain things are grouped the way they are, and so arbitrarily at times, for that but it is a nice example of the shiftJIS order.
    3) No. Nobody has really worked OCR up into a remotely usable tool in hacking. You might be able to generate a more traditional image and feed to an OCR program, however be aware that they are not the largest fonts in DS games so some things have trouble.

    You mean like sometimes when you have some unknown unicode and it will display its number in a square box? Something like
    https://i.stack.imgur.com/ewPoN.png

    Just rather than that have it display the value that encoded it.

    Yeah you could do that. A table file is basically a long list of

    00=A
    01=B
    02=C
    ...
    0F0A=Charactername


    You could in turn compose a table where it is
    00=00
    01=01
    02=02
    ...
    0A=0A

    You then open that in a spreadsheet or something and fill in the character you do know in a new column, you overwrite the second column with the third but only for anything which has an entry from the third and thus you have your table but any blanks will display as the character that formed them.
    I don't know how much time that would save as the translator would probably point out any gaps in your script and you could figure out the corresponding character easily enough.
     
  6. natanelho

    natanelho GBAtemp Maniac

    Member
    1,322
    345
    Apr 25, 2015
    Antarctica
    Between the Sacred Silence and Sleep
    Didnt mean that, but I guess I asked a question too far from the topic of this thread. So nvm I guess. You seem to understand a lot about tables and fonts, so would you mind to answer some pm's?
     
  7. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    Why do PMs? It is reading conversations on forums that taught me a lot about this sort of thing.

    Back to the question though. Do you mean replace existing kanji you might not want to use? Or do you mean add new characters to an encoding in gaps that might be there?
    For the former then that is how a lot of hacks are done. For the latter then that can be tricky as you get memory issues involved if the game wants to dump an entire section of them into RAM and was originally programmed to do one thing or another. The NFTR format can handle this better than some older systems, not to mention replacement works fine in most cases and it tending to be when people want to add accented characters to games and they only have A-Z,0-9 in them.

    I think I got confused when you mentioned tables -- tables are hacker tools, nothing to do with the game at all so adding things to them does not trouble the game at all.
     
  8. natanelho

    natanelho GBAtemp Maniac

    Member
    1,322
    345
    Apr 25, 2015
    Antarctica
    Between the Sacred Silence and Sleep
    I thought you were talking about tables, like ascii- table and the likes, but now you mentioned that those are hacker tools I guess thats not what I meant...
    I meant to ask, if that will be more memory effective to write all the games text when programming a game and then make a custom table which will give every character its own number depending on its first apearence in the text- like the #12 will be the 12th unique character that apears in the text, this way you wont need a huge table with all the characters you didnt really use and it might be important for jap cause - lots of letters and not much standarts afaik (and I dont know much). This might also reduce the size of the text, reducing each character to lower size (but that isnt much anyways nowdays...) like for this message tha table will be like-
    01-I
    13-a

    Pretty random but it might confuse reverse engineers for a while, and as I said b4, reduce ammount of characters in table.

    Edit- sorry for the pretty stupid question, I was learning programming on bad hardware in old lang's, so I allways was seeking for the most obscure ways to optimize code for disk space, performance etc.. might sound like a really stupid question so sorry
     
    Last edited by natanelho, Feb 23, 2017
  9. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    The table/encoding list is probably not a large part of the game, even if you include the pictorial (or, not that it really exists in games, vector) definitions of the characters/glyphs/runes/symbols/whatever. Whether it improves memory io or anything like that a useful amount on any kind of remotely modern system is a doubtful at best. For an older one it might. There are actually older games that swap out encodings as necessary to display other characters, typically in Japanese games this would be for the Dakuten over the general kana but it is hardly limited to that.

    What you describe though seems like you have taken the first steps in reinventing Huffman compression.
    You can read a slightly more expanded workthrough in https://tuxtina.de/files/seminar/LempelZivReport.pdf
    The short version is each character in a thing to be compressed is ordered by most to least common. The more common items are given shorter lookup values where the less common are given longer ones as the need arises.
     
  10. Bagira20
    OP

    Bagira20 Member

    Newcomer
    19
    0
    Feb 23, 2017
    Gambia, The
    I still can't figure out how I can access the different files inside of the ROM. All I have is the .nds file. How can I get the individual files (and in turn the NFTR file)?

    Also, I found out that most of the characters have 2 bits for one character encoding. How do I create a table with 2 bits per character?
     
    Last edited by Bagira20, Feb 24, 2017
  11. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    You mean two bytes? 2 bits is 00, 01, 10 and 11 or four total characters. 2 bytes is 16 bits aka a 16 bit encoding, necessary for most Japanese games as 8 bit is only 256 characters unless you go quite complex.

    You create a table the exact same way as you do any other table. If you are using a program like tablular there is a 16 bit entry function in one of the dropdown menus. If you are manually editing the table file (they are simple text based formats and can be happily edited in text editors).

    As for all the files there are many ways to open DS ROMs. If you are used to crystaltile2 then when you click the ds icon there should be an option in the leftmost dropdown menu of the new window (I think, it has been a while) to "split" the ROM and if not then it should be called split somewhere else.
    There are dozens of other tools as well -- I normally use ndstool, tinke has the options and again there are loads of options. I cover a bunch in my docs http://gbatemp.net/threads/gbatemp-rom-hacking-documentation-project-new-2016-edition-out.73394/
     
  12. Bagira20
    OP

    Bagira20 Member

    Newcomer
    19
    0
    Feb 23, 2017
    Gambia, The
    Alright, for some games, I got it working now, but some don't have their font in a NFTR file, but rather in a different format (mostly .bin from what I could tell so far). How can I get the values from files like these?
     
  13. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    For the most part you don't -- nftr is maybe not unique but definitely rare in games in what it does with having encoding right there in the file. Without it you are back to the more traditional methods (relative search, brute force, lots of testing, all the stuff previously covered and also covered in those docs I linked).
    Theoretically you could go the assembly hacking route and find its encoding table and match it to how the tiles containing the glyphs appear in the ROM/memory/whatever it is referencing. I don't know of any hacker that has done that though, something like it is done if you are doing a 16 bit to 8 bit font conversion* but as far looking at encoding engines goes it is mainly done if there is a really complex one like the table swapping stuff for Dakuten I mentioned in an earlier post.

    * Some older systems really lack memory and storage so if you are burning 8 bits per character by using a 16 bit encoding on a European language game you might want to convert, it has been done on the DS as well with the Jump Ultimate Stars project being an example. Either way you would know how the game handles encoding and how it then converts that to a lookup for a glyph, however you would instead be changing everything to work how you want and thus control the encoding yourself.

    I should also say .bin is a very generic extension and used for thousands of completely different formats on the DS alone, let alone the decades that computing has been around and the thousands of computing platforms available.
     
  14. Bagira20
    OP

    Bagira20 Member

    Newcomer
    19
    0
    Feb 23, 2017
    Gambia, The
    Ok, so after reading through all the options, it seems that corruption would be the method best suited for this. My questions though is: How do I find a good section to corrupt in CrystalTile if I can't see the text since I don't have a table yet? (and it seems like the usual tables e.g. shift JIS, etc... don't apply)
     
  15. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    Corruption in the basic form is the method to find things -- randomly change parts of the game as a whole and see what happens. With the DS you presumably have file names, directory names, extensions, file sizes and more. This may not give you the text directly but it will hopefully allow you to eliminate the things which are not in -- the large several meg file called sound_data.sdat is most likely sound so pick something else. If you open up an unknown file in a tile editor and you see usable graphics then it too is probably not your text.
    Depending upon what you change may crash the game outright too, even if you find the thing you want if the value it tries to decode is something it can not handle.

    Corruption is a crude method but it will eventually get you what you want.

    What I was likely speaking of above though was changing values to ones you know -- if you want to know what the character 8672 decodes as then when you have found the text file put a whole run of said 8672 somewhere in place of a sentence or something. Fire the game up (don't use a savestate if you can help it as the savestate might have been made after the original stuff was loaded into RAM) and then when you see a run of a single character where it should have been something else you know what that value is.
    That gets boring though so you might try multiple characters at once. Maybe with a pattern so you can be reasonably sure that is something you did and not a part of the game you had not seen.

    I should also say text can be found in the binaries and overlays (the file called arm9.bin and the things in the overlay folder). Hopefully it is not this but you never know.
     
  16. gnmmarechal

    gnmmarechal Kirigiri > Naoto

    Member
    GBAtemp Patron
    gnmmarechal is a Patron of GBAtemp and is helping us stay independent!

    Our Patreon
    4,672
    2,844
    Jul 13, 2014
    Portugal
    https://gs2012.xyz
    on the other hand, something like 10,000 amuses me.
     
  17. Bagira20
    OP

    Bagira20 Member

    Newcomer
    19
    0
    Feb 23, 2017
    Gambia, The
    Alright, this has been quite a lot of information so far, so let me recap real quick:

    1. In order to translate any ROM, you need a table in order to translate the hexadecimal into readable text.
    2. Since Japanese characters don't have a fixed order, relative searching is basically useless (unless one of the known tables applies)
    3. The other option is corruption, in which you rewrite part of the text file in order to see what changes.
    4. In order to find the text file, you have to split the ROM and guess by the names which file might be the text file

    Was that right so far? If yes, here are my questions:
    1. Did I understand correctly that I might have to play through the whole game to find out what has changed?
    2. After I changed some lines in the text file, how do I get the file back into the ROM/reassemble the ROM in order to test it?`
    3. Just my lack of knowledge, but can a file named "font" contain the text or is font something different?


    By the way, thank you a bunch for guiding me along here. Really appreciate it.
     
  18. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    1. If it is a game which uses a text engine then yes (certain games, usually lower text games like puzzle games, use pictures instead). Some games can have multiple tables for different aspects of text, or even different parts of the game, as well.
    2. You can do some things with value entries, say you have the font you dumped from memory and want to try something ordered accordingly or you reckon the kana will be in a standard order*, rather than letter entries but yes relative search is of very limited use in figuring out Japanese games.
    3. Corruption is a general data finding technique. The general idea is you change part of the ROM and see what changes/breaks after you run it again. What you describe is arguably a type of it but most would probably call it something else. But yes if you replace segments of text with something you control you can find out a lot about how the text engine works, and often far more quickly than hoping someone used a silly character somewhere in the game so you can figure out what value it takes.
    There are also many more options for figuring out encodings, I cover some in the guide I linked. Japanese has a few of its own as well like where one of the sets of characters will show up in a sentence/word/phrase, how often things might be mixed and more besides.
    4. You can do ROM wide searches if you like, and then look up what file it belongs to afterwards if you like -- for finding graphics I will occasionally load it up in a tile editor and press page down a lot to see what I can find, crystaltile2 usually has a little status bar down the bottom that tells you what file the cursor position belongs to in either the graphics or the hex window. However yes I do find the file names, extensions, sizes and more to be very useful in narrowing my searches. It is not the only way but well worth doing as a first pass. Equally some DS games don't use file names at all and have random names, and in some instances do other things. More commonly you can also find text contained within the ARM9 binaries and overlays, this is text mixed in with... anything really but primarily the code that is run on the processor, and yes it is more annoying than general text to edit.

    1. If you are corrupting then yes it is a possibility. You can focus your efforts though and as text will likely be there from the start to the end, and probably not change all that much, you can figure out most things from the earlier points in the game. Nothing stopping you from powering through with cheats either.
    2. Most things that pull it apart will give you the option to rebuild it as well. If you have not changed the length, or in crystaltile2's case that is not such a problem, then most individual file export programs like ndsts will also do import.
    3. Names are utterly arbitrary and it might be that not all files within a ROM will be used by a game (if you have ever copied a whole directory rather than pick individual files as it is quicker then know game devs do the exact same things -- we have even got source code before from games). I don't think I have seen a DS game use misleading names to bother hackers, though I have seen games eschew names of files entirely and use numbers instead. I have seen things presumably dropped from the final game be still in a ROM though.

    I should also say make sure you understand pointers enough to know what they are. I usually describe them as being like the contents page of a book and they are used in almost every game to describe the lengths of text sections. If you are replacing characters with the same length of characters for finding a table it is not so bad. Sometimes a game will crash or behave oddly if you go past the end of a text section with your "random" data and pointers can help here. You will have to learn them eventually but you can indeed determine the tables for most games without knowing anything about it.

    *the kana don't have a standard order like the Roman alphabet but unlike kanji there are a selection of very common orders that some use instead.
     
  19. Bagira20
    OP

    Bagira20 Member

    Newcomer
    19
    0
    Feb 23, 2017
    Gambia, The
    Just a quick question: Sometimes when I change code in CrystalTile and want to save my changes, I can't even though when I try to quit, it warns me that the file is modified and I should save it. Any idea what the problem is?
     
  20. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,540
    9,373
    Nov 21, 2005
    Crystaltile2 is not the best coded program, despite all the nice functionality, so I can imagine that is seen.