ROM Hack Waiting for nds rom translation help

Dollscythe

Member
OP
Newcomer
Joined
Nov 28, 2016
Messages
6
Trophies
0
Age
26
XP
26
Country
China
Hello everyone, first of all, English is not my native language, and I'm a newcomer to this forum, so please forgive my poor English and breaking the forum's rules, sincerely thanks.

Due to hobbies, I’m attempting to make a Chinese translation hack, the game is Ryuusei no Rockman: Ice Pegasus (also named as Megaman Starforce), presented by CAPCOM.

For preparation, I have read some tutorials (like gba-and-ds-rom-hacking-guide). By glance over the whole rom, I’ve found kinds of fonts like below. These fonts were stored in arm9.bin.

upload_2021-5-5_18-58-24.png


Presuming that the table was sorted by the same order, I choose a word and using the relative searching for verification. Indeed this is the table. But there are 483 kinds of characters (space included) and the game used only 2 bytes to represent a character (maximum 256 kinds), so I created a .txt table file range in 00 to FF, 00 = (space), 01 = 0 , 02= 1, etc. And leaving an question.
Furthermore, by relative searching I’ve found all messages were stored in mess.bin (mes = message, s for plural, strange naming)

Here is the progolue in game
upload_2021-5-5_18-59-4.png


By compare these nonsense I’ve got some information:

Red box: In game the message was “220X年…”. Compare one by one, E4 95 = 年, so we can learn that E3 = ぱ, E4 00 = ぅ, E4 01 = ぁ, etc. 8A = “.” But there is 00 between these dots, I don’t know what represented it was. By the way, E9 00 represents enter.

Green box: 9D AE E4 02 05 9D = “うちゅう”. From the table we know E4 05 = ゅ, but it was E4 02 05, 02 between E4 and 05.

Blue box: the text in game was “ニホンうちゅうかがくきゅく” and it used 20 23 to represent “うちゅう”. How odd it is! What a nightmare, there are many similar situations in the text.

Also I’ve tried to replace some bytes, expecting the right reaction. Found that if the 00 or 02 has been changed, the text will rolling endlessly, like overflow.

Frankly I’m wondering the feasibility of making a translation hack.

1) Chinese translation needs thousands of characters. I doubt that one character set has up to 244 elements (0XE4 = 244d). But E5 to FF are probably control codes so maybe I can’t create a new character set.

2) The game has 3 kinds of fonts, all of them were stored in arm9.bin so I'm wondering that is the arm9.bin can be expanded

3) The mess.bin is unreadable. There are a lots of 00 in sentences, and some bytes between sentences, sometimes 4 bytes, sometimes 6 bytes. The most tricky question I’ve mentioned above, is 20 23 = “うちゅう”, looks like no rules to follow.

Can you give me some advices? Thanks for the help!
 

FAST6191

Techromancer
Editorial Team
Joined
Nov 21, 2005
Messages
33,893
Trophies
2
Website
trastindustries.com
XP
22,648
Country
United Kingdom
There are Chinese translation sites out there, indeed the Crystaltile2 tool there was made by some Chinese coders. They are also some of the most prolific translation creators out there -- most Japanese games on the DS would come out in English if you waited a few months, other than those 3 ique games did Chinese language peeps get anything? To that end we often saw complete Chinese translations before the US release was out, and sometimes if the Japanese release was a few weeks ahead of street date then even the Japanese one.
I don't know where the best Chinese translation/ROM hacking are these days (my bookmarks are somewhat out of date, the few sites once doing well now not doing as much. tgbus is hopefully something of a start here. Anything providing news on translations, homebrew, emulators, flash carts and the like will usually have someone that can point you at where ROM hacking is happening).

"only 2 bytes to represent a character (maximum 256 kinds)"
2 bytes is 16 bits, which would be tens of thousands. 1 byte is 8 bits which is indeed 256 possible combinations. Though your underlying reasoning appears correct.


Anyway the 00 part is probably to do with some form of compression, though might be some kind of markup. As a quick check does the mess.bin start with 10,11, or 40h and then a value that is the length of the file? If so then probably compression. Crystaltile2 might even tell you if you look on the file viewer (it will have a little cylinder with LZ or something on it).

ARM9 expansion... you can but it is tricky. The arm9.bin files are the main code section for the game -- they are loaded in memory as the game boots and stays there until it is turned off (they use overlays in commercial games to have bits of code come in and out as needed, an old and very crude way of doing dynamic code but it works). To expand it you will have to figure out what comes after the ARM9 section or maybe find some other free area of memory (rare on the DS to get more than a handful of bytes but not unheard of), or do overlays. If I need to expand one I normally find space within the ARM9 to use -- I usually suggest wifi error strings if it is a wifi game, as nobody needs all the random errors and with wifi being dead these days (give or take the alternative servers, which probably still won't need all the random errors).

As far as Chinese needing thousands of characters. Sure, and if the font is going to be in the binary that makes it harder than something like the normal nftr fonts many other games use which are separate files and can be expanded trivially. Your options then become expansion, screen by screen or condensing it.
Expansion we already covered. Unless you mean expansion from an 8 bit encoding to a 16 bit one, normally people go the other way (Japanese games tend to be 16 bit as their Kanji also number in the thousands, and converting to 8 bit encodings basically doubles the space you have available for text. Though simpler games/games aimed at kids might be Kana only and thus can work as 8 bit.) but the principle is the same -- find the way the game knows to fetch sections of text and get it instead to fetch and decode 16 bit chunks at a time.
Screen by screen is seen more in NES games and games of that era where memory was sharply limited. Here the game would fetch whatever characters it needs for that particular screen and generate them. Essentially making a game with thousands of tables. Would be quite the feat to add it as you would have to fairly radically change how the text works, or maybe even make it work like a more conventional game.
Condensing it. You won't need every character in the Hanyu Da Cidian, if the game only needs 3000 of them in total that is a tricky thing but far easier than the tens of thousands if you wanted to do that. Normally we see this with non English European language where said languages might need 5 or 6 extra characters and not use certain English ones, or need certain punctuation but the principle holds.
You could combine all three in one hack to make individual aspects easier (if by expanding, and doing some more screen by screen things, and changing the language to feature fewer characters then it might be easier than trying to get enough characters in via just one of the methods above).
 

Dollscythe

Member
OP
Newcomer
Joined
Nov 28, 2016
Messages
6
Trophies
0
Age
26
XP
26
Country
China
There are Chinese translation sites out there, indeed the Crystaltile2 tool there was made by some Chinese coders. They are also some of the most prolific translation creators out there -- most Japanese games on the DS would come out in English if you waited a few months, other than those 3 ique games did Chinese language peeps get anything? To that end we often saw complete Chinese translations before the US release was out, and sometimes if the Japanese release was a few weeks ahead of street date then even the Japanese one.
I don't know where the best Chinese translation/ROM hacking are these days (my bookmarks are somewhat out of date, the few sites once doing well now not doing as much. tgbus is hopefully something of a start here. Anything providing news on translations, homebrew, emulators, flash carts and the like will usually have someone that can point you at where ROM hacking is happening).

"only 2 bytes to represent a character (maximum 256 kinds)"
2 bytes is 16 bits, which would be tens of thousands. 1 byte is 8 bits which is indeed 256 possible combinations. Though your underlying reasoning appears correct.


Anyway the 00 part is probably to do with some form of compression, though might be some kind of markup. As a quick check does the mess.bin start with 10,11, or 40h and then a value that is the length of the file? If so then probably compression. Crystaltile2 might even tell you if you look on the file viewer (it will have a little cylinder with LZ or something on it).

ARM9 expansion... you can but it is tricky. The arm9.bin files are the main code section for the game -- they are loaded in memory as the game boots and stays there until it is turned off (they use overlays in commercial games to have bits of code come in and out as needed, an old and very crude way of doing dynamic code but it works). To expand it you will have to figure out what comes after the ARM9 section or maybe find some other free area of memory (rare on the DS to get more than a handful of bytes but not unheard of), or do overlays. If I need to expand one I normally find space within the ARM9 to use -- I usually suggest wifi error strings if it is a wifi game, as nobody needs all the random errors and with wifi being dead these days (give or take the alternative servers, which probably still won't need all the random errors).

As far as Chinese needing thousands of characters. Sure, and if the font is going to be in the binary that makes it harder than something like the normal nftr fonts many other games use which are separate files and can be expanded trivially. Your options then become expansion, screen by screen or condensing it.
Expansion we already covered. Unless you mean expansion from an 8 bit encoding to a 16 bit one, normally people go the other way (Japanese games tend to be 16 bit as their Kanji also number in the thousands, and converting to 8 bit encodings basically doubles the space you have available for text. Though simpler games/games aimed at kids might be Kana only and thus can work as 8 bit.) but the principle is the same -- find the way the game knows to fetch sections of text and get it instead to fetch and decode 16 bit chunks at a time.
Screen by screen is seen more in NES games and games of that era where memory was sharply limited. Here the game would fetch whatever characters it needs for that particular screen and generate them. Essentially making a game with thousands of tables. Would be quite the feat to add it as you would have to fairly radically change how the text works, or maybe even make it work like a more conventional game.
Condensing it. You won't need every character in the Hanyu Da Cidian, if the game only needs 3000 of them in total that is a tricky thing but far easier than the tens of thousands if you wanted to do that. Normally we see this with non English European language where said languages might need 5 or 6 extra characters and not use certain English ones, or need certain punctuation but the principle holds.
You could combine all three in one hack to make individual aspects easier (if by expanding, and doing some more screen by screen things, and changing the language to feature fewer characters then it might be easier than trying to get enough characters in via just one of the methods above).

Sorry for the mistake I've made, two hexadecimal numbers are 1 byte or 8 bits., and the game only uses 1 byte to represent a character.

Unfortunately, most of the Chinese hacking sites were dead, the TGBUS included. A huge of valuable experience of previous people were redirect to 404. Found here an active forum, so I came for help.

Maybe make the translation work step by step will be clear.
1) Extract the texts. All of the conversations were stored in mess.bin and it was totally a mess, although the table were founded it still can't be read. As you said it should be compressed with LZ77, but the mess.bin didn't start with 10. I found the Crystaltile2 has a function which calls LZ77/HUFFMAN compression search and the result is the mess.bin should be a plaintext, while part of it was compressed. So I'll write a batch, try to slice the mess.bin and decompress it, expecting to extract the texts.
2) Translate Japanese to Chinese. After translation I'll doing some language tricks (like synonym replacement) aim to reduce the use of Chinese characters. 1.5 thousands of Chinese characters will be enough. I can't delete the Hiragana / Katakana because the game uses passwords to unlock item, which was written by Hiragana / Katakana.
3) Insert / Replace the translated texts. This game is a kids game, only 483 kinds of characters it have, most of the conversations were Hiragana / Katakana, only a few of specific words were Kanji. I guess the program divided these 483 characters to 2 sets, set 1 is 0-243, the other is 244-483 but it's index started with 0. The program read the mess.bin once a byte, if the byte smaller than E4, it will decoded the byte at set one, if the byte is E4, read the next byte, and decoded it at set 2. Translate to Chinese must leads to the Table's expansion. E5 to FF are probably control codes. Presume E5 is idle, so I'll rewrite the program, if the program read the byte E5, then read the next byte and decoded it at set 3...sounds tricky. As you said I have 3 ways (expansion, screen by screen or condensing it), it's time to make an effort, thanks.
 

SteveXMH

New Member
Newbie
Joined
Oct 17, 2021
Messages
2
Trophies
0
Age
18
XP
17
Country
China
I'm also like this game and trying to translate. Here's some ideas from my guess.
About the text "220X年...", binary data of the last dot have a space (00 in hex) are because the game uses little-endian short int to read two bytes and then the space will be on the last position to be read.
So that is why if we change this space into other word will cause loop -- It's posibily a null character to descripe a text end!
But if the guess is true, we can't explain why some of the text (especually those single byte character) are sorted as non little-endian. Like "220X", it would be "22X0" on hex display. So if possible we have to use the debugger to judge that.
And about the message map, I thought the pointer map will be on the front of the mess.bin data, but I've still very unknown about that.

Also now some fans of Rockman have translated Ryuusei no Rockman: Red Joker and Ryuusei no Rockman: Black Aces. Though they only translated the battle cards and something about the battle feature, there still can be some experience from the translate team of translating the game. Why not try to ask them for help?

(My main language is Chinese and I'm little poor in English so please reply if you confused about the text)
 
General chit-chat
Help Users
    Veho @ Veho: Steam is a social network too. For what it's worth.