Translating roms: how to do it.

Discussion in 'NDS - ROM Hacking and Translations' started by FAST6191, May 23, 2008.

May 23, 2008
  1. FAST6191

    Reporter FAST6191 Techromancer

    Nov 21, 2005
    United Kingdom
    Added to the rom hacking guide stickied at the top of the forum and in my signature but left here (probably not to be updated).

    There are several guides (I myself have written a few) but it seems it is needed again and this will probably serve as a good part of my main guide. Feel free to copy and paste/alter/repost this as you see fit (PM me and I will try my best to inform you of updates I make to it).

    There are 4 main areas in DS hacking:

    core hacking: how the game works. Attacking takes atk stat adds a random number and takes it from health, sticking 014f in a certain memory block means a certain sprite will appear.......

    graphics: for years games have relied on graphics to display different worlds.

    text: I will limit this strictly to the game generated representation of a language rather than a developer created picture of the rom

    multimedia: This is audio and video.

    There is no one hardest area: stats can be simple XML files and text can be compressed and encoded with the actual binary code that is run on the processor(s).

    2 things now arise: what does a string of 1 and 0's mean and I know what I want so how do I find it.

    core: there are three components
    the binary
    extraneous files
    file system.

    Graphics: again there are three components:
    2d tiles
    2d bitmaps
    3d graphics.

    text: 3 sections.

    multimedia: 3 sections again.
    ingame cutscenes.


    <b>Core hacking</b>

    The binary: if you have made it this far you will have heard of computer languages. Simply put computers blindly follow strings of 1's and 0's but trying to get the hardware turned on to display, configured an image of a box is a hard task and machines can be different so people abstract this and then write compilers and runtime environments for the code.
    The most fundamental computer language is called assembler/assembly and is simply a more human readable form of binary (granted there are some niceties for some assembly environments regarding memory locations).

    extraneous files: you can include stats for your monsters/players/weapons/whatever in the binary or you can do it in files/sections distinct to it. In actuality a large chunk of rom hacking comes down to this but a lot of those sections are common enough across games and even systems.

    file system: you can lump all your code, images, multimedia and text into one file and systems up to and including the GBA did this very sucessfully.
    However as time goes on you will likely want to make an index of files and be able to call upon said index to get files. Most disc based systems do this and so does the DS. Files can have their own filesystem as well. Common ones include (N)arc, SDAT, NCER, NANR, NCLR, NTFP, NTFT, NTFS and NSBMD
    <a href="" target="_blank"></a>
    <a href="" target="_blank"></a>


    2d tiles
    Think paint by numbers but with pixels.
    Page search 2d graphics:
    <a href="" target="_blank"></a>
    The position on the screen is determined both by something in the core hacking world and as far as the hardware is concerned by the OAM:
    <a href="" target="_blank"></a>

    2d bitmaps
    For various reasons 2d tiles are not suited for a given application so each pixel is given a colour by the numbers it is made of.

    3d graphics.
    The DS has 3d hardware and it would take someone very foolhardy not to use it and try to do it all in software. Not all files use the NSBMD format but they all work with the 3d hardware:
    <a href="" target="_blank"></a>
    In short it creates a mathematical model of a situation (based on 3d coordinates often with different origins) and then generates the according 2d image while the usual scaling, rotation and translation movement methods occur.


    Many years ago you may have made a "code" along the lines of 01=a, 02=b, 03=c and so on. Nothing has changed now computers are there other than hexadecimal is used and normally more than just 2 digits are used (26 characters in the current roman alphabet used by the English language x 2 for 2 cases, 10 numbers, some punctuation and a few other terms and it starts adding up. Account for French, Spanish, Slavic, Asian, Greek, ancient greek and you end up with thousands of characters.
    table is the term given to the list of numbers and their definitions.
    See example hack below for a start or read any hacking guide or site as these are the fundamentals of hacking.

    You know a sentence ends becuase of the fullstop, a computer however does not so it can either parse the text or more commonly use and index/contents file/section known as a pointer table.
    These are explained in some detail here page search "pointers</span>"
    <a href="" target="_blank"></a>

    binary code is just electrical signals and try as hard as you might but you will probably not be able to interpret them in any meaningful way. This means a number then calls a representation to be displayed on the screen as a glyph, character, rune or whatever your language of choice chooses to call the pictorial representation or spoken word.
    The font is often just a 2d tile or two for a character but it can be more complex than that (see deufeufeu's JUS project: <a href="" target="_blank"></a>


    Sound appears in waves for those that recall science lessons. The most basic form of storing sound is to sample the waveform at intervals and play back the amplitude of the wave at the intervals they were taken from. Do this well enough and you replicate the sound as far as people can detect (this is usually considered to be around 2 times the highest frequency someone can hear or around 44KHz for "transparent" quality. The DS tends to muddle around from around 11KHz (about telephone quality) to around 48000 Hz ("DVD" quality) and come in three main formats:
    a cymbal hit, a piano sequence.....
    an arrangement of instruments
    full blown wave format.
    whole songs or voice samples.

    44000 samples a second each with 16 bits as you can imagine takes a lot of space up in very short order but rather nicely most sounds do not alternate that much so you can then assume one millisecond is the same as the next (it gets far more complex than that of course) which means you can drop the size of the file. Now not being a waveform however you have to make it into one which takes CPU time and other resources.
    Most DS roms use the SDAT format
    <a href="" target="_blank"></a>
    <a href="" target="_blank"></a>

    Much like audio with sounds if you play enough pictures quickly enough (17 frames per second is about the limit with 20 being comfortable and 30 being pretty good) you create the illusion of movement.
    Not much happens from frame to frame and not much changes between one pixel and the neighbour to it so you can call them similar and save space (again it gets far far more complex with everything from psychology, electrical engineering, physics to fields of maths many have never heard of being involved), again this incurs the need for more CPU and resources to decode.
    While intellectual pursuits are likely what brings rom hackers to the table reverse engineering all but the most basic of formats is a monumental task.
    The two most common hacks are
    file system: make one video play by tricking the game into loading it by altering the file sytem or pointers that the game uses. Usually done with an eye towards saving/making space but making videos play at a given time is OK too.

    and the lucky occasions where you have a known format like rad tools BINK:
    <a href="" target="_blank"></a> (very common in home consoles, the PC market and has a presence in the DS world)

    Ingame cutscene:
    the controls tend to fall dead and the sprites on the screen or the 3d images get moved normally according to a list somewhere. Saves on space used (a sequence of numbers for a 5 minute cutscene like this is not likely be any more than a 30 second video clip)


    <b>Example hacking route adapted from a PM discusssion about translating a rom:</b>
    The game was a harvest moon title on the DS. Use of first person is not usually done in technical documents I agree but it should hopefully not distract from the message.

    First stage is find the text encoding and dump the script ready to be translated. The most common type of encoding for Japanese games on the DS is shiftJIS, a rather horrible abortion of a standard as far as things go but it works. The rune factory series uses it so there is a good chance Harvest moon will also use it. Hopefully it will not be compressed.
    <a href="" target="_blank"></a>
    Simple Japanese text editor:
    <a href="" target="_blank"></a>

    Before you get to all that though you will have to pull the rom apart, DS roms are much like CDs in that unlike the other systems files are distinct from each other making rom hacking easier once you have figured out what they do and makes it easier to get at them but harder than other systems until you do find out what is what.
    A topic covering the basics of it:
    <a href="" target="_blank"></a>

    The script will likely be in several files and if you are lucky they will have names (English based) to indicate what they do: something like NPC_name for the names of the non player characters is not uncommon (and if the harvest moon and rune factory games are anything to go by then you should be OK here as well).

    You now have a choice: if you want to recruit someone I would make a demo showing you can do something, rather basic but here is my pulling apart rune factory 2:
    <a href="" target="_blank"></a>
    I also did some stuff for the recent 3d gundam game and in an old (pre trucha) thread for the wii that could be a decent demo if you want more.

    Putting it back together: text files have an index (search for "pointers" on rom hacking guides). Once you have this you will probably want to alter the pointers to make the text read more easily (or even to work at all: if the game has something outside boundaries it may crash). This stage takes a while.

    In short there will probably be the following stages:

    pulling the rom apart (takes a minute at most once you have everything set up_

    finding the text (could be very quick or it could take days or more)

    decoding the text (if you have an emulator it should be fairly quick)

    reworking the text: If you hand your translators a nasty looking script they will probably not like you much). If it is something common like shiftJIS I suggest leaving it as what you got it in unless you really really have to change it.

    translating (this one is very much how you and/or your team work: my personal preference for translations from Japanese is not to assume your readers are Japanophiles but assume they are at least knowledgeable (unlike most official translations).)

    reinserting the text: encoding it is not bad, but redoing the pointers may be. I suggest a spreadsheet unless you want to code your own app.
    ShiftJIS is nice as it has ASCII built in (although the extra Spanish characters may prove a problem). The font on the other hand can cause problems and I suggest leaving it until you have got the translation "playable" unless you have a good coder on the team as it can get messy.

    reworking the text: you may have to trim lines/sentences to make it appear on screen better or fix bugs.

    If you get a team the usual team rules should apply:
    everyone should be able to do everyone elses job, maybe not as well as the person but it should be able to be done.
    everybody has tasks/jobs.
    in this case everybody should know the basic Japanese text: what are the kana (and the subsets), kanji and romaji and what are they for (you could probably leave out stuff like kana in place of kanji when medical terms are needed).

Share This Page