Yeah, I saw that, but the 8 uncompressed files in txt can't be all of the game's text can they? You were saying before that you were trying to get the compressed files in the *.act archives uncompressed. So did you find that they contain no text at all? Or is the txt folder really all of it?
EDIT: Ok for the script files, I've figured out how the pointers work, but I'm still so unsure on so much of the file I'm not sure how useful this will be to you.
The pointers are all based off of the start of the text table, and 04000000 is the text opcode. If you search the file for "040000000000000064" you'll get a lot of the hits that reference the strings. But there are entries that have the first variable at beyond 0, and there are also ones where that 64 isn't set either. You can't just search for 04000000 though because you'll get a hell of a lot of stuff (obviously). So it's going to require a lot of manual insertion work.
The location for the string pointer is 0x14 bytes after that 04, and it's set from the beginning of the text table.
Also, the length of the text table is stored as a word in the header at 0x2a, so you'll need to update that too. No idea what the other values are.
Here's an example of it working:
And the very next line following it (to show you that the next line works properly):
Here's the actual strings I used:
Let's make this an uber long string to test things,comma here should bring this into a new line,and again. neoxephon neoxephon neoxephon,I heard that if you say it three times into a mirror, he'll appear and not be able to find string pointers. ,Adding a whole FF more bytes,surely we're off the screen by now123.Here is string 2,right the way down here, This is jsut blah blah ble is string 2,right the way down here, This is jsut blah blah ble is string 2,right the way down here, This is jsut blah blah ble is string 2,right the way down here, This is jsut blah blah blsut blah blah bl.
As they are in the file, the second string is just after the "now123." I made both of them FF bytes longer than their originals.
A nice thing I found out, after adding in manual commas for newlines after reading you said they were used, is that the game has automatic newlining. \o/ That's always a pain, nice that the game handles it itself. I also thought the text would go off the screen and not automatically scroll for you, but nop, that's handled too, nice.
Yeah, I saw that, but the 8 uncompressed files in txt can't be all of the game's text can they? You were saying before that you were trying to get the compressed files in the *.act archives uncompressed. So did you find that they contain no text at all? Or is the txt folder really all of it?
As far as I am aware, that is it. The script is the biggest amount of text. The game doesn't have a lot of items and whatnot. Some of the game text is graphics.
The assets for the Dela scenario that they added to the PSP game (wasn't in the original SNES game) are located in the \ext folder. Some of the items and enemies are the same, while some are different. So, for example, item.tb in the \ext\txt folder is not identical to the item.tb located \txt\ folder.
Great job on figuring out the pointers!
mp1_14.scp (which contains all of the script for the Tutorial) seems to handle the pointers just slightly different.
With all of the other script files, I can get the pointers every time by searching for:
04 00 00 00 00 00 00 00 64 00 00 00 00 00 00 00 00 00 00 00
But with mp1_14.scp it is instead:
04 00 00 00 01 00 00 00 64 00 00 00 00 00 00 00 00 00 00 00
And the control codes for showing the Tutorial images (tuto00 and so on) get counted in with the text strings connected to it (they don't have their own pointers like I originally thought).
Say we start at the 00 before tuto00 (beginning of text block) the pointer is the number of bytes from that 00 until the 00 after the text string (not the 00 after tuto00). CC seems to tell it that the text string should begin. This seems to be the case for all text strings that have a special control code before the actual text (tuto00, a_name8, br1_a5, etc.)
So basically, if we have a control code before the text string, the pointer includes the control code and the following text string.
A little bit later today, I'm going to go through all of the script files and make sure the pattern for finding the pointers is the same for all of them. Then, I'll start working on programming a tool.
Yeah, I said it'll be different, and that's why it requires a bunch of manual effort. There are some which don't have that 64 in either. They're just variables for the 04 opcode, so they're not handled differently, it's just called with different values.
Also, you have that backwards, it starts at the first character, not at the 00. The pointers all point to the first character of the string, and the 00 (a null) is what marks the end of strings. The CC does something else which isn't in very many strings at all, not even the ones that call the next part of the script, like tuto12, tuto4 etc etc, but it is on just some of them like tuto14, and also some just normal strings. Either way though, that CC isn't loaded, the pointer goes past it. Something else may interact with it though.
EDIT:
Ok so the first variable controls the type of text box. 00 is for a text box that goes by itself, needs no interaction. 01 is a standard dialogue-type box that requires you to press O. 02 is the same as 00. 03 is one of those wall panels:
But the text is ghosting a bit. So other variables at play there. EDIT: Ok, it's because the text is shadowed. When you want to use a wall panel type text, use #6C at the start of the string, that sets the text colour to black and it looks normal.
04 is the same as 03.
05 is a bit interesting, it plays the string automatically, then fades it out, but keeps the text box open. I then have a blank text box, sooo that's for showing multiple strings per text box, without actually closing-reopening it.
Not sure what 06 is. 07 gives us a yes/no box, and so does 08 (except 08 shows more text after choosing an option, 07 doesn't):
And etc etc, so that's the first variable. From what I can see though, the 04 opcode takes differing amounts of variables, based on whatever the first value is, or maybe it's all under the umbrella of a larger opcode which remains the same. I'm not going to go through every variable, as I'm sure most of it will be difficult to see what it does.
So there isn't so much of a "pattern" to it, as the variables just change based on what sort of text they want.
Some opcodes (all variables are dwords I think, and all values are in hex):
01 - 20 bytes - return?
02 - 20 bytes
03 - ? bytes - This one is a tricky trickster. It has the absolute block count for the block after it, at 0xC from its opcode. Then where it ends is at at 0x10. So if you had a 03 opcode like: 03000000 65000000 01000000 a0000000 a2000000 - Then that 03 has an extra block (beyond its standard 20 bytes) at base+(a0*20), which means that our 03 code itself is at base+(9f*20). And the next opcode will start at base+(a2*20). So essentially, our 03 is at base+13E0, and the next opcode after 03 will be at base+1440.
04 - 20 bytes - set string
17 - 20 bytes - move camera?
1a - 20 bytes - move camera?
1e - 20 bytes
1f - 20 bytes - wait?
20 - 20 bytes
23 - 20 bytes
43 - 20 bytes
44 - 20 bytes
45 - 20 bytes
46 - 20 bytes
There's tonnes more. Anyway, most of them are 20 bytes, the whole thing works based on 20 byte blocks. It saves the start of that script (well, sort of mini function if you will), and then saves how many 20 byte blocks it's jumped, and then pulls the opcode from that base address+(blocks*20).
I'm going to assume that 03 isn't the only one that can change its length based off a value.
To investigate yourself, just breakpoint at 08845404, that's where it reads the opcode, you can find everything from there. v0 has the opcode we're reading, v1 has our base address, and +C bytes to v1 is where our block count is.
Go through and get all their lengths, figure out any annoying ones like 03, and then you can write a decompiler which can dump all the opcodes and all their variables into a file, and you'll then be able to edit things, and recompile them without breaking anything.
Another thing is that the length of the script table is saved in the script file header at 0x28 as a word, and it's the length / 20. Sooooo, but process of deduction, all those words before that are going to the sizes of other sections in the file as well..... I think.
EDIT: So I made a dumper, no reinsertion yet, but here's how looks so far. This is dumped from mp1_01.scp:
The more opcodes you can figure out, you can replace them with actual names to make things easier to read/edit. Now I just need to take that and try to firstly get back an identical file, and then work on having it update stuff properly if the size changes.
EDIT:
Woohoo, did it. Got it all recompiled and it works perfectly. So now you can just add and change whatever you want and it'll work, provided you at least stick to the format and the correct parameter count etc.
So it's very easy to edit I think. You can just change some variables and see what they do. Like I said earlier, 0x4 is our text (I replaced it with TEXT in the dump), and its first variable controls what sort of text it is. I made one with it as 7, a yes/no box, then a second with it as 1, our standard "press the o button to continue" text.
The only really major problem remaining I think are the 03 opcodes. Because they have their start/end block hard-coded, they have to be updated properly for you to add more codes, otherwise you'll just get infinite looping code. Not sure how to best solve that really. I don't really want to essentially copy my whole dumping code again as a part of the recompiler... Hmm...
Superb work, Kelebek. I don't understand the technical tidbits, but the screenshots look very promising.
Looks like there's space for 40 characters on a single line up to three lines per text box (two with Yes/No question). Great to see how virtually all Falcom games make use of the same font.
It now handles all inserting and removing, so you can add or remove as much as you want. Figured out the tables so you don't need to change anything per file either. Also added a lot more opcodes so you can hopefully get the other files, may be a few missing, but you can just add them like the others. All opcodes are 0x20 bytes after all.
Rename your original script to add ".orig" and then you can run it just like before. "script.py filename.orig -e" to extract, and "script.py filename.data -i" to recompile.
About specific editing of the data: You can remove lines without problem, if you want to get rid of an opcode just delete the line.
To add lines though, you need to add them so that they start with the { and NOT a number. For example:
If you leave the number at the start of the line you'll break the recompile, so just make sure the lines you add to the script begin with the {. You don't need to do anything special for editing lines. You don't need to touch that length at the start of of the function or anything either, I don't even use it.
So now you can fully and easily translate the game text. I look forward to one.
Admittedly, due to my lack of experience with scripts, it took me some trial and error to get the python script running and to convert the .scp to .data, but I eventually succeeded. Looks great man, excellent work.
How should we going to proceed from here? Are you still working on a tool (around Kelebek's script), neoxephon? Or is there a way for me to directly inject the .scp's back in the game?
Yeah, you can directly import it back in, very easily. Firstly, you need to get the data.lst file which is in the game's USR dir. Then, you'll need to open it up in a hex edit, and find the entry for the file you've edited. The variable at 0x8 (basically it's what's right after the filename) is the size of the file, so you need to change that to match the size of your new script file. After you've done that, just put your fixed data.lst into the game, and overwrite the old script your new one, and done.
For reimporting with UMDGen though, I had to enforce the file positions every time, a cut-down iso always crashed the game for me.
A bit more on the data.ls front if you need it:
Goto (ctrl+g usually) 3f80
You should see in the text window that this position starts with "script." That's the folder name of what we want to edit, and then you can see below it, that every 0x10 bytes has a new filename, and it lists all the files in the script folder.
So say you edited mp1_01.scp, well that's directly under "script," it's our first file. So from where "mp1_01" starts, if you go 0x8 bytes, the value EE80 is there right? That's the one we need to change, and it's in the same position for every file, 0x8 bytes after the start of the filename.
So the file is little endian, meaning to get the actual value you need to reverse the numbers. So say for instance, the size of your new mp1_01 file is 32778 bytes in decimal, as reported by viewing a file's properties in Windows. So open Windows' calculator, and go to View > Programmer. On the left should be a radio button for Hex and Dec. Put in your filesize with Dec selected, and then just select Hex, and it'll give you your filesize in hex. So 32778 becomes 800A. Now to get that back into the data.lst, you just need to reverse the bytes in 2s (so 80 0a to 0a 80). Then in your data.lst, overwrite that EE80 with 0A80 (or whatever your filesize is), and that's it.
I was able to use the extractor. No idea how it was developed. I feel like I just took the "how to walk" tutorial in your favorite videogame.
How to run Python script (Windows).
1) Install python
2) At the command prompt: python script.py
Also I was having trouble viewing the Japanese content of the output file. I ended up succeeding with JWPce.
Now it is time to talk about doing the project. Which script file should be worked on first so we can test insert it?
There are 46 script files. There are 8 text tables. There are 25 ITP files but I'm not sure how many of them need modifying. I posted the text contents of mp1_01.scp here. Not sure if that was useful.
I could use advice on crowd-source translation wikis. We can probably use that one but there must be better ones out there. I think the project will go faster if we have multiple translators working on it. I'm new if you couldn't tell.
That explanation was certainly helpful and taught me a lot about the files, yet iso still keeps crashing on me. Allow me to retrace my steps:
1. Exported mp1_01 using your Python script, translated a line (replaced "どれほどの時間が経っただろうか。" with "I wonder how much time had passed.") and imported it with the Python script again. The new file size is 32,780 bytes, which is 800C in hex. Renamed the file to "mp1_01.scp" again.
2. Using UMDGen to open data.lst, I modified 3f90 by replacing EE80 with 0C80 to match the new file size. I saved it as "data.lst" locally and then used UMDGen to reinsert the data.lst again.
3. Used UMDGen to import the new "mp1_01.scp" in the /script folder.
4. Saved the game as .iso, opened it in PPSSPP and it crashes.
I did note that when importing files, the file size drops from 385.50 Mb to 284.10 Mb. Other than that, I followed your guide faithfully - apart from, and I think this is where it's going wrong, I don't know how to "enforce file positions".
Yeah, that crashes the game, the filesize being cut.
Open up the original game, and in UMDGen go to File > File List > Export. You only need to do this once.
Then add in your changed files, and every time you want to save the iso, go to File > File List > Import, and find that original list. Choose yes to forcing the file positions, and then save it uncompressed. It should keep the original size and not crash.
Oh err, I didn't end up looking into that. That's sort of beyond my ability I think, I don't know anything about font editing, and to change the character from comma to something else with would require ASM hacking.
But on page 2 omarrrio already found a solution for it, so I didn't think I really needed to fix it if he's been able to edit the font to get it working. So I'd say look into editing the font, or ask omarrrio to explain how he did it.
While you're at it, you could also check if that tool works as well on the scripts for Nayuta no Kiseki. (and G-Han could take a look on it as well whenever he's interested... even though it's MASSIVE)
The one translation project on that one got cancelled because of "technical issues", and since the text is similar it would be a shame not to expand the tool's functionality to include it as well. I wrote years ago an article about it on tcrf, about an internal file forgotten there by Falcom which details exactly what each map script file is for (0000 is for example for the center of the village in Lost Haven, the "hub" of the game).
About the comma... I guess I should make myself useful for once in this thread
Here is a pspfont (extracted from Nayuta no Kiseki, but since they are identical...) where I redraw the @ character (you shouldn't need it, right?) with a "comma+space".
Try if it works when you just write @ instead of a comma in the text.
I can easily affirm just from a rapid peek at the font that all Falcom games on the PSP are variable width and the values have to be stored somewhere (... could be in the font itself, even).
I figured it wouldn't be that easy
Well, hopefully someone has a look on that one eventually (other than it using 0A as a breakline, and it accepting ascii English characters... the volume of the text makes it not the type of the game where I just hex edit the text inside like I usually do), but let's enjoy what we have here
As each year passes, retro games become harder and harder to play, as the physical media begins to fall apart and becomes more difficult and expensive to obtain. The...
After several months of work, the Harbour Masters 64 team have released their first public build of 2Ship2Harkinian, a feature-rich Majora's Mask PC port. This comes...
Retro handheld manufacturer Anbernic has revealed its first clamshell device: the Anbernic RG35XXSP. As the suffix indicates, this handheld's design is inspired by...
With the vast success of Super Mario Maker and its Switch sequel Super Mario Maker 2, Nintendo fans have long been calling for "Maker" titles for other iconic genres...
Palmer Luckey is known for his pursuits into the world of virtual reality, having founded Oculus and designed the Rift VR headset. Prior to the $2 billion dollar...
Ubisoft has today officially revealed the next installment in the Assassin's Creed franchise: Assassin's Creed Shadows. This entry is set in late Sengoku-era Japan...
Another day, another great emulator that makes its way into the Apple Store for more users to enjoy. With Apple opening its store up to videogame emulators earlier...
After a little more than three years of exclusivity with the Epic Games Store, Square Enix has decided to bring their beloved Kingdom Hearts franchise to Steam. The...
Another day, another Nintendo DMCA takedown against fan-made content.
Just a few minutes ago, Nintendo issued a DMCA takedown notice against a widely known and...
Continuing with the number of available retro emulators found in the Apple Store, after Apple's decision to finally allow videogame emulators on their store, another...
Palmer Luckey is known for his pursuits into the world of virtual reality, having founded Oculus and designed the Rift VR headset. Prior to the $2 billion dollar...
As each year passes, retro games become harder and harder to play, as the physical media begins to fall apart and becomes more difficult and expensive to obtain. The...
After several months of work, the Harbour Masters 64 team have released their first public build of 2Ship2Harkinian, a feature-rich Majora's Mask PC port. This comes...
Ubisoft has today officially revealed the next installment in the Assassin's Creed franchise: Assassin's Creed Shadows. This entry is set in late Sengoku-era Japan...
After a little more than three years of exclusivity with the Epic Games Store, Square Enix has decided to bring their beloved Kingdom Hearts franchise to Steam. The...
Another day, another Nintendo DMCA takedown against fan-made content.
Just a few minutes ago, Nintendo issued a DMCA takedown notice against a widely known and...
Sony is once more attempting to reintroduce players to their older library of games by re-releasing classic PlayStation 2 titles onto the PlayStation Store. During...
Retro handheld manufacturer Anbernic has revealed its first clamshell device: the Anbernic RG35XXSP. As the suffix indicates, this handheld's design is inspired by...
With the vast success of Super Mario Maker and its Switch sequel Super Mario Maker 2, Nintendo fans have long been calling for "Maker" titles for other iconic genres...
The latest State of Play is here. This is PlayStation's Summer showcase, providing updates to new updates on upcoming games and brand new reveals. The 35-minute...