ROM Hack In need of a Hacker for text files of Tongari Boushi

InochiPM

Well-Known Member
OP
Newcomer
Joined
Apr 23, 2019
Messages
57
Trophies
0
Age
28
Location
Kentucky, USA
Website
www.mackne.net
XP
543
Country
United States
Hey guys, so i'm in need of a hacker to help me with these text files for Tongari Boushi.
They are basically script files that house a lot of text all in UTF16-LE and separated weirdly for the message files.
Being as its UTF16-LE it has to be edited with pointers or some kind of tool from what i have looked up and in that case i would need a hacker to help me with that as i have no knowledge of pointers what so ever.

Theres only a few files with text in the game and some of them need to be edited in just such a way that they fit correctly inside the bounds of the text box.
If ANYONE would like to help me out please do not hesitate to DM me on here or reply to this thread as i'm severely behind in this games translation and would like to finally make headway in the project since the textures for the UI are all but finished.
 
  • Love
Reactions: HinaNaru Cutie

FAST6191

Techromancer
Editorial Team
Joined
Nov 21, 2005
Messages
34,116
Trophies
2
Website
trastindustries.com
XP
23,033
Country
United Kingdom
"pointers or some kind of tool"

OK... fair bit of ground to cover it seems. Whether it is UTF16, ASCII, EBCDIC, ShiftJIS, EUCJP or completely custom text encoding (most games ever, and even more so if you count limitations and custom aspects) matters little, if at all.

Pointers are common in the vast majority of games on any system. Fully parsed text and fixed length stuff is rare and usually only specific circumstances (menus being the most common there).
Pointers are much like the contents page of a book in that they are lists of locations where you can find things. If you add or remove content then everything following that ends up in a new place and your contents page (pointer list) is not going to work. Same thing for games when you increase of decrease sentence length (almost an inevitability in translation, and even when not the results are usually clunky).
Pointers are annoying to edit by hand with the potential for mistakes rising massively the longer things get (can you imagine going through counting pages for new contents by hand?) so yeah most people do build tools of some form or have more automated methods. This can be anything from fiddling with a hex editor search function, through spreadsheets, to programmable tool (see stuff like atlas and cartographer) to automated tool to nice text editor with UI and everything (various pokemon games providing I don't want to say nice examples as many of those cause more hassle than they are worth but yeah).

There are various styles of pointer used. I usually break them down into three types, maybe four if we include files in general
i) Conventional pointers
ii) Offset pointers
iii) Relative pointers
iv) Sector based addressing

All types of pointers can be split apart inside a file, have lengths or markup data included among them or other things that make them less readable than simple one after the other lists.

i) Conventional pointers are the obvious first case of pointers. You take the number that the pointer is and that is the location within the file/memory/ROM of the data you want.
ii) Offset pointers. Much like conventional pointers these count from a location, however it is a different location to the start of a file (an offset if you will). Usually this will be the end of the pointer/formatting section and start of the text.
You tend to find this in searching where the list of pointers will always be a fixed distance from where they actually are in the file.
iii) Relative pointers. For various technical reasons it can be quicker to move from the current location to the eventual location by telling a computer to jump a certain distance from current location. Relative pointers are then this in pointer form. You take the location of the pointer and the value at the pointer location and add them together.
You tend to find in searching that if say each pointer is 4 bytes long then the pointers will be 4 then 8 then 12 then 16 then 20... bytes out from where the actual location is in the file/text/ROM.
iv) Sectors. So you gave 32 bits to a pointer. Lot of data there and any more makes the maths a lot harder (they are called 32 bit processors for a reason). That's OK because nobody will ever need a file or ROM or hard drive larger than 2^32 bits long (about 4 gigabytes). Thus we break drives/files/optical discs/... down into chunks, sectors if you will. Easy now to say sector 4, bytes blah through blah. The cost tends to come in any space not used in a sector is wasted. Also why your computer will tell you the size of the file and size of file on the disk as separate entities.
Other than following files from reads to the CD you will tend to meet a related concept here -- 4 gigs is a lot of data but even video files tend to be measured in megabytes, text is rarely even in the megabytes. To that end the first few bits of pointers might be used to indicate who is talking, whether to make the text bold, whether the file is compressed, whether it is a subdirectory...
You will tend to see this as pointers make it look like the file is massive, or only the last few bits of a pointer actually matter (if your pointers are 02ED, 03FF, 04AA, 04D0 and next is suddenly F4F0, followed by 05A0 and continuing on then chances are there is something to it.

Do also look up endianness as it can be either regardless of what the base system is. Short version is you probably met thousands, hundreds, tens, units... way back in school, or maybe have met some people doing day-month-yeah, others doing year-month-day and those that were dropped on their heads as babies doing month-day-year. Computers have an equivalent of this and for various historical reasons different processors and files will use different types
https://www.freecodecamp.org/news/what-is-endianness-big-endian-vs-little-endian/

Anyway they can be tricky to search for, however most modern games do give you a break somewhere in this, or shorten the manual side of things.
Primarily many text sections will end with 0000 or something in many games. This is great as you can then search for all the 0000 sections, get a list from the hex editor (many will have a search function that allows you to search for all of a given value in the file), dump that into a hex editor and compare it to the pointers.
Even without that many sections will end with a . ! or ? and you can get a list of those, now there might be some others in the middle of text sections but you can work with that a bit if you know the contents of the game.
If it is going to be an annoying game and just have text sections end to end then you probably want to figure out what is displayed where in at least a few sections of the game, and find the pointers (usually in the start of the file or in a nearby file) to match up with them and work accordingly. Now you can go line by line adjusting pointers as you go, or make a UI based program to do it for you (it knows what line is doing what after all and will presumably be doing it line by line, or with its own notes inserted in the middle to remember to replace those with pointer values later).
 
  • Love
Reactions: HinaNaru Cutie
General chit-chat
Help Users
    Psionic Roshambo @ Psionic Roshambo: https://imgur.com/gallery/j9svCfb