Reverse Engineering File Formats

Discussion in 'Computer Programming, Emulation, and Game Modding' started by kprovost7314, Jun 11, 2016.

  1. kprovost7314
    OP

    kprovost7314 GBAtemp's Official Bara Master

    Member
    1,688
    883
    Dec 24, 2014
    United States
    In that bara manga ( ͡° ͜ʖ ͡°)
    Other than learning C/C++ (which I'm doing), how would one go about reverse engineering file formats?
     
  2. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,366
    9,169
    Nov 21, 2005
    Making a file format from scratch is a pain in the arse so most programmers do not, and if they do then they are likely very basic and thus very limited (it is not impossible for a project to devolve into pointless complexity but often times the complexity is there because it is necessary).
    C/C++ is good as many things are made by such programmers so when you can read something as a series of shorts and longs or similar it helps. It is not the be all though -- I don't doubt some around here could plough through a ROM hacking session and figure stuff out despite being barely able to make things compile from the command line.
    You can try hackmes, though most of those are not file formats as much as a broad spectrum of hacking.
    You can try finding out something about a format -- you need not learn everything. Consider say the old binary microsoft word documents, the whole thing with tables and forms and macros and charts and embeddable objects and all that jazz was a long and hard project to reverse engineer, however you could probably figure out how to text is made bold or italic in fairly short order (it is the classic cheat making thing where you take a file, make a small change and then compare to see what goes).
    Find a conversion tool and rewrite it. Python has been called the glue that sticks modern computing together and the way it does that is by making interface programs for a lot of things so they can speak to each other.

    Alternatively things will be very close to what they appear in the hardware because why not. Learn the hardware things go to. By similar token APIs and database lookups will probably resemble the final form in many cases.
     
    Tomato Hentai likes this.
  3. Silverthorn

    Silverthorn Spiky!

    Member
    356
    234
    Mar 27, 2012
    France
    Last edited by Silverthorn, Jun 13, 2016