Here is the tool:
https://www.mediafire.com/file/e44c1ksxl7a17z8/ToD2 String Extract v1.7z
Instructions are in the included readme. It will dump all strings in .SCED files to the best of its ability. (Too big for forum attachment because of included binary, 138kB.)
When you are finished, you may want to concatenate .tsv files into a spreadsheet. I wrote a program today that does that too using LibreOffice:
https://heroesoflegend.org/forums/viewtopic.php?f=38&t=349
I found 845k characters. But not all of them have to be translated. Mostly because of a large number of repeats. I have no idea how to get a better estimate.
I put a deduplication function in this tool, each of the repeating strings needs to be translated only once.
Please take a look at this sample:
https://docs.google.com/spreadsheets/d/1s1cHvhnX4Z7nctDvD4RTDprxDn68kIqfp4aTRMp51jA/edit#gid=0
Definitely don't start translating yet. I have no confidence in my ability to insert this back. Let me at least put Lorem ipsum in there for you to prove it can be done.
I am not too happy with it because function boundaries are not identified.
What I mean by function boundary: At some point during the first scene it jumps to a different file, then jumps back to the first one.
I tried, but I can't find them. If you look in the sample, 6470 line 34 is the last one before the switch.
Translators will not be able to follow the game flow and that will make their translation worse than it could be.
I was scared that translators wouldn't be able to ID who's talking but it seems that won't be a problem.
More stuff on the strings. I couldn't figure a lot of them. Any ideas?
These commands were the main ones: 0x12, 0x14, 0x15, 0x16, 0x17, 0x18. They all seem to have variable length operands ending in 0xBC or 0xC0, so I just assumed that and it seemed to work fine, but I just dumped them as raw bytes since I don't know what they do. I used my usual method (type in the bytes at the first string of the game) but either nothing shows up or, more rarely, it crashes the game, when I do that.
I also couldn't figure out 0x03 or 0x0B, but those seem to take a full-word integer operand.
I was having troubling unpacking the code sections (necessary for insert).
***Possible answer B (clean one): We disassemble the SCED interpreter's native MIPS code and write down all possible opcode and operand sizes.
Any tips on doing this? There's more than just a few operands, there are a lot. (Sizes don't include the operand byte)
11: 1
24: 1
31: 1, 34: 1
80: 0, 81: 0, 82: 1, 83: 0
90: 1, 91: 1, 93: 1
A0: 2
C0: 0, C2: 0, C5: 0, CD: 0
D0: 0, D5: 0, D7: 0
E0: 1
F0: 0, F1: 2 (not sure), F2: 2 (pointer), F3: 2 (pointer), F8: 2 (pointer)
For pointers I think I can identify which are applicable to the string table by searching for the offsets of the start of strings. Some pointers are applicable to the code table.
And do you have advice for the lowercase letters? I was thinking of modifying the font, if I can find it (the halfwidth font is somewhere else). And then using a translation table to translate strings into halfwidth katakana.