[REVERSE ENGINEERING] Finding the correct encoding for hiragana text

Discussion in 'PSP - Hacking & Homebrew' started by Serberker, Jan 15, 2019.

  1. Serberker
    OP

    Serberker Advanced Member

    Newcomer
    5
    Jan 13, 2019
    Hungary
    Hi all!

    I'm kinda stuck with the encoding of a game.
    The ingame text seems to be hiragana, but the hex values don't match up.
    Check the image.
    The text is supposed to be: "Hello, and good day to you. Wait, you are not from this village are you?"

    What is the typical encoding of japan psp games?
    It seems the characters are stored on 2 bytes, but UTF16-LE, and shift-jis doesn't return the correct characters.
    For example the first character's shift jis coding is: 82 9F, but in the hex editor it's 86 01 (186 in little endian).

    I doubt that they have put an offset in the encoding. Also I think it's unlikely that they used custom encoding.

    Any help / hint / idea is greatly appreciated.
     

    Attached Files:

    • x.
      x.png
      File size:
      416.5 KB
      Views:
      0
  2. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23
    Nov 21, 2005
    United Kingdom
    Why? The vast majority of console games in history have used custom encodings, and while the PSP was in an era where we started to see more common encodings used by default it was so very far from unheard of for custom encodings to be used.
     
  3. Serberker
    OP

    Serberker Advanced Member

    Newcomer
    5
    Jan 13, 2019
    Hungary
    You won :D
    Found the custom encoding table, and it works flawlessly.
    Feeling confident as it has latin characters too.
     

    Attached Files:

    • x3.
      x3.png
      File size:
      111.2 KB
      Views:
      0
    Last edited by Serberker, Jan 15, 2019
    crimsonwolf8439 likes this.
  4. crimsonwolf8439

    crimsonwolf8439 Member

    Newcomer
    4
    May 22, 2016
    How large was the table?
     
  5. Serberker
    OP

    Serberker Advanced Member

    Newcomer
    5
    Jan 13, 2019
    Hungary
    2171 characters. (the file itself is 4342bytes+12bytes header)
    Why do you ask?
     
    Last edited by Serberker, Jan 16, 2019
  6. crimsonwolf8439

    crimsonwolf8439 Member

    Newcomer
    4
    May 22, 2016
    Looking for a reference in my game :P
     
  7. Killrain

    Killrain Newbie

    Newcomer
    1
    Jan 26, 2019
    Poland
    Serberker, cool.
    How did you find that custom encoding table?
     
  8. Serberker
    OP

    Serberker Advanced Member

    Newcomer
    5
    Jan 13, 2019
    Hungary
    Statistics :D

    The game itself have like 22k files. I modified my extraction script to group by and count the file extensions.
    The result was ~16 (or 18 can't recall atm) different extensions.
    I checked the files with the lowest count and the encoding table was there, with the extension of .TTL

    In my case the files with the highest count were texture and model files.
    That was followed by the animation files. This count was lower because static models (like trees, houses) don't have an animation.
    And then varius script and other files.
     
    Last edited by Serberker, Feb 10, 2019
    crimsonwolf8439 and Killrain like this.
  9. Killrain

    Killrain Newbie

    Newcomer
    1
    Jan 26, 2019
    Poland
    Hmm, interesting.
    Thanks!
     
Quick Reply
Draft saved Draft deleted
Loading...