Every game is different, though there are common things seen in basically every system. It is doubtful someone has gone and documented this game, though I have been surprised by the visual novel translation scene in the past.
Text can either be as text (hex bytes converting to text) or as images (think bitmap images but can be one of many many formats, including custom). In visual novel world the image thing tends to be reserved for older games but still makes the odd appearance, and stylised text in the game images themselves will see you contemplating whether you want to edit the images.
Text then.
If you made a code as a kid along the lines of A=01, B=02, C=03 and so on then computers do the same, just using hex. The one you most commonly see is probably ASCII (
http://www.asciitable.com/ ), though unicode is supposed to take over from that (
https://www.joelonsoftware.com/2003...-about-unicode-and-character-sets-no-excuses/ ), Japanese has a few other choices (
http://rikai.com/library/kanjitables/kanji_codes.sjis.shtml and see same site for EUC-jp). Games though can be completely custom, more modern stuff tends to be one of the things mentioned above but again can be anything.
I will also note that visual novels are not necessarily basically a slideshow with a simplistic UI any more. This means that text can be found buried in a quasi scripting language or even outright programming language which can help things or make them harder.
Anyway games don't usually (modern PC might be different) have time to be wasting parsing a document to know where things stop and start. This means before the game is made the devs will have lists of where things are located, and possibly some extra info like any special formatting. This list of things that tell you where something is (think like a contents page of a book) is called pointers, because they point the way to things. Much like the contents page analogy if you take chunks of pages out or put new ones in then someone counting the required number of pages is going to find themselves not where they want to be. The would be hacker gets to change pointers too. They come in a variety of formats (standard, offset, and relative with maybe sector based stuff later down the line) but I will leave that for now.
Fonts. Not all games support all characters, modern stuff tends to be better but not always and you might still want to do something, especially if you are going to want the likes of ñ ch ll ¿ and ¡.
Compression and encryption also are worth a note. Compression is used to lower file sizes transmitted, stored on disk or similar at the cost of having to wait a bit to decompress when using (though on more recent stuff pretty much an eye blink extra at best). Encryption is a thing some encounter mostly on PC where devs will encrypt things to attempt to stop people ripping content or altering their games.
The would be hacker then gets to contend with all of this, though not all things will come up in all games.
https://gbatemp.net/threads/gbatemp-rom-hacking-documentation-project-new-2016-edition-out.73394/ covers some more. It is more for the GBA and DS but the general principles apply from the first home computers you loaded code onto to modern PC stuff.