ROM Hack [Questions] Help with dumping text. Not sure if compressed?

Dobu

New Member
OP
Newbie
Joined
Sep 17, 2020
Messages
4
Trophies
0
Age
31
XP
50
Country
United States
Hello,
I've been lurking a little bit and sorting through old threads trying to find the answers, but not having much luck.
I'm trying to dump text from a game (ザックとオンブラ・まぼろしの遊園地) in order to translate it. I was able to extract all the files with NDStool.
It doesn't seem like very many files are compressed... They all extract to .bin and are neatly organized.
upload_2020-9-16_22-50-0.png

The problem is--and this is probably just because I'm not very good when it comes to reading code--I'm not sure how to go about extracting the raw text, and I don't see any clearly marked table for text (not that I'd really have the best idea what it would look like).
The scripts definitely aren't stored in plain text, unlike some visual novels I've seen. Even using Shift-JIS just displays a garbled mess. Maybe I need to add an offset? But I'm not sure what that number should be in that case.

I've mostly been using CrystalTile2 to go through the files, though WinHex was no better (in fact I found it way harder to use).

I would really appreciate if someone would be able to look into this game in particular and perhaps walk me through the process of extracting the text, because I feel I'm in way over my head on the programming side of things.
 

Dobu

New Member
OP
Newbie
Joined
Sep 17, 2020
Messages
4
Trophies
0
Age
31
XP
50
Country
United States
Oh, there is a header in here. It's in the top level folder.

upload_2020-9-18_9-38-10.png
I suppose I should've posted that as well. The folder I showed in the original post is in "data\". But despite this, I can't find any info in the header that looks like text information.

They might still be custom, but the structure seems similar to most other DS visual novel style games I've seen.

I believe the overlay folder is the animated text that appears like when you clear a minigame or something... It's really hard to tell when I can't make any sense of the files though. They're all labeled overlay_####.bin and are around 10kB each

I'd upload the files themselves but I don't really know how mods feel about that so I'll play it safe.
 

FAST6191

Techromancer
Editorial Team
Joined
Nov 21, 2005
Messages
36,798
Trophies
3
XP
28,348
Country
United Kingdom
A table file is a thing a hacker makes to describe the encoding and maybe some other aspects of the text used by a game. I can't rule out there being one in a game one day (we have had everything including full source code left in there before) but it would be a first.

The root directory stuff is potentially some of the things you want but not in the way you are thinking.
ARM9.bin.
This is the main binary that runs on the main DS processor. It can have text (and music, and levels and graphics and stats and...) within it. Not the best form on the part of the coders to do that but it is what it is and that is why people say hacking is not always trivial.
ARM7.bin.
In commercial games this is usually a basic workhorse binary that does housekeeping. It is important but varies so little that you can swap them between games of the same vintage and nothing will happen, or in some cases you might even remove anti piracy protection.
For hackers it is mostly a place to inject code to fiddle with the memory.

The rest are just aspects of the ROM layout such that ndstool can rebuild it again. Ignore them for now.

Overlays.
The DS is a bit basic as far as hardware goes and lacks a nice system for fetching new code that might not always be needed but is at some point (in games this tends to be rare events/events that only happen after a long sequence).
To do this the DS kicks over a portion of memory to overlays and they can be chucked in and out of memory at will. I have seen games have thousands of overlays and said overlays contain all the data of the game, but even in the more general case they are much like the ARM9 and can include data. You say it has some kind of message in there and I can well believe it.

As for the first picture.
DWC is likely the download play component (is there a file called utility.bin or maybe something .srl that opens up just like another nds file?). Sometimes you can find the download play stuff simpler and gives up the goods more easily on the base game, other times you find it is very different.

In the font folder. By any chance are there ntfr files in there ( https://gbatemp.net/threads/nftr-editor.105060/ ). If so then that is not your table but is keys to its kingdom -- it will include pictures of every character and what it encodes as. You then fill in the blanks (hopefully it borrows the same order as something like shiftjis or eucjp, or maybe can be OCRed, else it is a lot of typing and character recognising).
If it is something like fnt.bin then oh well. Probably still something you can do (it will usually give an order which can be helpful).

Script may or may not be what you want to look at. If not then chances are it is the game's scripting system/scripted events or AI maybe. If not that then I would probably look at game.bin and maybe the other .bin files in the same directory (names are a clue for which I would look at first).

Quick test for compression. Try to compress it again (basic zip file, nothing special). If it compresses down a lot then probably not, if it does not then it either is compressed, is video, is audio or is essentially random data. While not every time then there are also other tells like the file starting with 10,11 or maybe 40 on some later games, and the names of the files will often have a LZ ( https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf http://members.iinet.net.au/~freeaxs/gbacomp/#BIOS Decompression Functions ) or an underscore or some other indication in the name/extension that something is different.

I should also note the overlays and arm9 might be compressed with a custom type of compression seen in DS binaries, often known as BLZ. https://www.romhacking.net/utilities/826/ should be able to handle it (and all the other common types).
 

Dobu

New Member
OP
Newbie
Joined
Sep 17, 2020
Messages
4
Trophies
0
Age
31
XP
50
Country
United States
Oh, thanks for the in-depth reply. I'm gonna go through it piece by piece.
A table file is a thing a hacker makes to describe the encoding and maybe some other aspects of the text used by a game. I can't rule out there being one in a game one day (we have had everything including full source code left in there before) but it would be a first.
I see. That would explain why I don't see the table. So I guess I would need to make one myself for decoding the scripts...

The root directory stuff is potentially some of the things you want but not in the way you are thinking.
ARM9.bin.
This is the main binary that runs on the main DS processor. It can have text (and music, and levels and graphics and stats and...) within it. Not the best form on the part of the coders to do that but it is what it is and that is why people say hacking is not always trivial.
ARM7.bin.
In commercial games this is usually a basic workhorse binary that does housekeeping. It is important but varies so little that you can swap them between games of the same vintage and nothing will happen, or in some cases you might even remove anti piracy protection.
For hackers it is mostly a place to inject code to fiddle with the memory.


The rest are just aspects of the ROM layout such that ndstool can rebuild it again. Ignore them for now.
Hmm... Yeah I won't touch these files too much right now. I'll worry about it when rebuilding the rom. From what you're saying, it seems like these will be more critical to keep in mind later.

Overlays.
The DS is a bit basic as far as hardware goes and lacks a nice system for fetching new code that might not always be needed but is at some point (in games this tends to be rare events/events that only happen after a long sequence).
To do this the DS kicks over a portion of memory to overlays and they can be chucked in and out of memory at will. I have seen games have thousands of overlays and said overlays contain all the data of the game, but even in the more general case they are much like the ARM9 and can include data. You say it has some kind of message in there and I can well believe it.

With this in mind, I'm now fairly certain that the overlays folder is either for the minigames, minigame descriptions, and/or perhaps start menu stuff (in-game I haven't unlocked anything in the start menu, but it appears similar to Professor Layton games, where there's a Summary, Save/Load, and Minigame Review option, with 5 more options to unlock later, possibly as game-long minigames). There are only 66 overlay files, and as I said they're almost all around 10kB so I don't think I'd go as far as to say most of the game is stored this way. In general, the file system seems to be pretty well organized. There's nothing here that really leads me to believe the files would be so poorly organized.

As for the first picture.
DWC is likely the download play component (is there a file called utility.bin or maybe something .srl that opens up just like another nds file?). Sometimes you can find the download play stuff simpler and gives up the goods more easily on the base game, other times you find it is very different.
Yes, there is a utility.bin there, and it would appear at a glance to be the Download Play folder. It's got some messages at the top, but after scrolling down a bit, it just becomes the same garbled mess as the rest of the files, so I can't really tell if there's anything useful there.
upload_2020-9-18_18-23-27.png

In the font folder. By any chance are there ntfr files in there. If so then that is not your table but is keys to its kingdom -- it will include pictures of every character and what it encodes as. You then fill in the blanks (hopefully it borrows the same order as something like shiftjis or eucjp, or maybe can be OCRed, else it is a lot of typing and character recognising).
If it is something like fnt.bin then oh well. Probably still something you can do (it will usually give an order which can be helpful).

Yes, this is it! I had download the NFTR editor earlier today but didn't realize it would tell me the encoding as well, so I hadn't used it until you mentioned it. There's NFTR files for the name table, the main script font, and the furigana, and a mapping number.

Name:
upload_2020-9-18_18-31-35.png


Message:
upload_2020-9-18_18-32-47.png


Rubi (furigana):
upload_2020-9-18_18-33-20.png

Script may or may not be what you want to look at. If not then chances are it is the game's scripting system/scripted events or AI maybe. If not that then I would probably look at game.bin and maybe the other .bin files in the same directory (names are a clue for which I would look at first).

Quick test for compression. Try to compress it again (basic zip file, nothing special). If it compresses down a lot then probably not, if it does not then it either is compressed, is video, is audio or is essentially random data. While not every time then there are also other tells like the file starting with 10,11 or maybe 40 on some later games, and the names of the files will often have a LZ or an underscore or some other indication in the name/extension that something is different.

I should also note the overlays and arm9 might be compressed with a custom type of compression seen in DS binaries, often known as BLZ. should be able to handle it (and all the other common types).

I tested the script.bin files for compression and it looks like they're not compressed. I'm pretty sure these are the text files for the scripts and the cues for sprites/text boxes since they're named by chapter, and map_link is probably referring to names of locations on the map.
There are no LZ files in the rom. Even the video files are left in plain MODS form, so I guess the second half of the original question is answered, nothing in the rom appears to be compressed.
upload_2020-9-18_18-28-20.png

So now that I've got the Mapping numbers for the characters, I suppose my next question is what would the next step to extracting these be? For example, the main character's name is ザック, so am I just going to be searching for a string using the numbers from the NFTR editor (30B6 30C3 30AF from the Message character map) and comparing where that string appears the script files to where the characters appear in the NFTR files with the full ROM pulled up? I guess just subtracting those values should give me the offset, huh? I'll try it and report back, but if you have the answer before then please don't hesitate to post it.[/quote]
 

Attachments

  • upload_2020-9-18_18-32-16.png
    upload_2020-9-18_18-32-16.png
    18.5 KB · Views: 202
Last edited by Dobu,

FAST6191

Techromancer
Editorial Team
Joined
Nov 21, 2005
Messages
36,798
Trophies
3
XP
28,348
Country
United Kingdom
Names can be tricky in scripts -- if the name can be custom, or maybe changes throughout the game, then often scripts will have a placeholder for it. Similar thing for "it costs [blah] to buy this item" so I might avoid that one or skirt around it by searching either side.
Try it if you want and yeah it is broadly the right idea (relative search operates on a related principle, for English games I will tend to skip the lot and use that to search for " the " as it is almost certain to appear in most scripts somewhere and should be unique enough that I can narrow down quickly, afraid my Japanese linguistics is not up to the task for this one) so find a line of the script, figure out what the encoding should be and search the ROM for it. It is not so bad on the DS but might also want to avoid anything that has a new line (most game consoles and games on them, give or take some the later stuff, are too lazy/slow to start a new line when it runs out of space so instead they will have it indicated.
At this point your main goal is to find the script so anything that does that will do. After that you can backfill all the table.
Afraid it will be a lot of grunt work unless you can get some kind of OCR program to work. Don't think we have a reliable one -- Crystaltile2 claims one but it is always buggy whenever I play with it and usually tries to put things in Chinese characters (they share a bunch after all) which just makes things worse. If you want to contrive something with a screenshot and a normal OCR program then play as you will.

The Kana at least in that are a fairly common ordering with the ?kuten next to them so you can probably do that quickly enough and hopefully gain some readability.
The Kanji on that large image... don't appear to be the same order as http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml or http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml or http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml though I might have missed something. It could also be one of the orderings used in a test somewhere but my Japanese educational techniques knowledge is not up to the task for that one.

Anyway there is no link between the font and the script (were you reading a NES document or something before this?) and location in the ROM. The font is merely a list of the pictorial representations, tile size and lookup value for the text decoder not entirely unlike a more modern PC style font (most other things on the DS are a bit more classical -- NTFR is quite special as these things go). No unicode or whatever equivalent lookup value (which is what the table formats aim to provide, if you are bored then http://transcorp.romhacking.net/scratchpad/Table File Format.txt is not a bad overview and general discussion) hence the grunt work.
 

Dobu

New Member
OP
Newbie
Joined
Sep 17, 2020
Messages
4
Trophies
0
Age
31
XP
50
Country
United States
Okay, I see.
As for names, the only name that changes anything is the name chosen at the beginning (changes the save file name and nothing else).

But yeah, looks like it's based on Unicode, actually, just with some of the glyphs replaced with special glyphs and English characters. Looking up random kanji, they correlate with the Unicode table positions. It's just the beginning that looks different.
The RUBI characters (furigana displayed over kanji) and name table also correlate to the unicode values (I was able to check all of those since they're smaller files)
Now the problem I'm having now is the same problem you've mentioned. When displaying in plain Unicode in CT2, it displays Thai and Korean characters in the hex viewer, but displaying as Unicode (UTF-8) won't display the Japanese characters and shows null character boxes instead.

I think I was able to find the location of the script, however. Surprisingly not in the script files, but it seems to be somewhere in the mess.bin. At least, I can find ありがとう。within that file and no others. Using WindHex, I can't view the Japanese characters either, so I still don't have a way to see the script, but at least now I know where it is I suppose.

Okay I've found the script, it's all based in Unicode and stored in Mess.bin. I'm guessing the scripts files are event scripts and flags and instructions to play voice lines.
So now I need to figure out how to get the text out of mess.bin and into a text file that I can easily read and edit. Hm...
 
Last edited by Dobu,

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
  • No one is chatting at the moment.
  • K3Nv2 @ K3Nv2:
    It must be the 1st already
  • BakerMan @ BakerMan:
    1st of what?
  • BakerMan @ BakerMan:
    may?
  • K3Nv2 @ K3Nv2:
    Oh yeah it's in September
  • Xdqwerty @ Xdqwerty:
    @BakerMan, yea i think its different
  • BakerMan @ BakerMan:
    ok, because here it's in september, right before the fuckin school year starts
  • Xdqwerty @ Xdqwerty:
    good night
  • BakerMan @ BakerMan:
    as to you
  • K3Nv2 @ K3Nv2:
    How do you know if the night will be good when you're asleep
  • BakerMan @ BakerMan:
    because i didn't say i was asleep
  • BakerMan @ BakerMan:
    i said i was sleeping...
  • BakerMan @ BakerMan:
    sleeping with uremum
  • K3Nv2 @ K3Nv2:
    Even my mum slept on that uremum
  • TwoSpikedHands @ TwoSpikedHands:
    yall im torn... ive been hacking away at tales of phantasia GBA (the USA version) and have so many documents of reverse engineering i've done
  • TwoSpikedHands @ TwoSpikedHands:
    I just found out that the EU version is better in literally every way, better sound quality, better lighting, and there's even a patch someone made to make the text look nicer
  • TwoSpikedHands @ TwoSpikedHands:
    Do I restart now using what i've learned on the EU version since it's a better overall experience? or do I continue with the US version since that is what ive been using, and if someone decides to play my hack, it would most likely be that version?
  • Sicklyboy @ Sicklyboy:
    @TwoSpikedHands, I'll preface this with the fact that I know nothing about the game, but, I think it depends on what your goals are. Are you trying to make a definitive version of the game? You may want to refocus your efforts on the EU version then. Or, are you trying to make a better US version? In which case, the only way to make a better US version is to keep on plugging away at that one ;)
  • Sicklyboy @ Sicklyboy:
    I'm not familiar with the technicalities of the differences between the two versions, but I'm wondering if at least some of those differences are things that you could port over to the US version in your patch without having to include copyrighted assets from the EU version
  • TwoSpikedHands @ TwoSpikedHands:
    @Sicklyboy I am wanting to fully change the game and bend it to my will lol. I would like to eventually have the ability to add more characters, enemies, even have a completely different story if i wanted. I already have the ability to change the tilemaps in the US version, so I can basically make my own map and warp to it in game - so I'm pretty far into it!
  • TwoSpikedHands @ TwoSpikedHands:
    I really would like to make a hack that I would enjoy playing, and maybe other people would too. swapping to the EU version would also mean my US friends could not legally play it
  • TwoSpikedHands @ TwoSpikedHands:
    I am definitely considering porting over some of the EU features without using the actual ROM itself, tbh that would probably be the best way to go about it... but i'm sad that the voice acting is so.... not good on the US version. May not be a way around that though
  • TwoSpikedHands @ TwoSpikedHands:
    I appreciate the insight!
    TwoSpikedHands @ TwoSpikedHands: I appreciate the insight!