How did you learn hex coding/hex editing?

TheN00b21

Well-Known Member
OP
Newcomer
Joined
Dec 6, 2020
Messages
52
Trophies
0
Age
23
Location
Somewhere
XP
292
Country
United States
So if you don't know, I'm pretty stupid in some areas. And hex viewing is definitely one of those areas. When I have a question about a complicated software or program, they always just say "JuSt FiGuRe It OuT iN hEx". It's been bugging me for a year now and just wanted to ask the people that are geniuses in this area how they learn or did it with what prior knowledge they had. If you could give a link on where you did it or something else, that would be great. If you don't know hex then: 1. Welcome to the club 2. Feel free to talk about how you learned a programming language like C or something else 3. If you don't have any other knowledge about programming or hex, just talk about something cool in technology you did this year (Homebrew, Modding, Game Creation, etc.)
 

FAST6191

Techromancer
Editorial Team
Joined
Nov 21, 2005
Messages
36,798
Trophies
3
XP
28,321
Country
United Kingdom
Hex editing is not a thing, a skill, or ability beyond the obvious "hex is 0 through F, type it like you would in a word processor but using that instead" and less obvious "hex editors tend to be split into three sections -- left typically is the location in the file/memory/hard drive, middle is the data represented as hex, and right is typically some kind of text decode that can be customised to be a different encoding or a custom one in some cases as you desire). The word processor analogy also often holds quite well -- hex editors tend to feature niceties like overwrite/insert, go to location, simple common operations (I would say most boolean and bitwise operations are much like "make bold" or change case in a word processor).
Skilled ROM hackers might use a hex editor as a least worst option to make a small change, or initially analyse a file to see if it looks similar to things they have seen before*, has certain tells**, has patterns worth exploring or similar. They might also realise a change could be in any number of places but is in a fixed location relative to something easy to find, or if there is a wide selection of options might direct a user to make a change with one rather than making however many thousands of patches that the combinations of such changes might make (think put this number in here to make the pokemon of your choice appear, and put this other number to put this in your inventory -- if there are 500 pokemon and 500 items that is 250000 total combinations before you even consider someone might only want 1 potion rather than a full stack, far easier to say "this location and this number from this list, and this location and this list has numbers"). As the hex quite literally represents the contents of the file then hackers might share dumps and screenshots for other hackers to contemplate but they know the other hacker will be able to filter out the useless information.

*this does include visually. I could look to see if what I think could be a section of pointers makes sense given the rest of the file (assume pointers for text are at the end of lines/sections and comparing section count with possible pointer width) but if I change the window width/section width and suddenly end up with a line of 0000s going down the screen or a diagonal line/repeating pattern on diagonals then I am probably onto something.

**I know if a DS file starts with 10, 11 or 40 then I should look into whether it is compressed, can also compress the file (compressed files don't compress much, uncompressed files might well do), I also know many DS files start with "magic stamps" as well as the extensions the files will tend to have, I know GBA pointers tend to have 08 as the first number as that is where most GBA games see the ROM in memory but also know where that fails, I know the DS is fairly weak as these things go so tends to have certain formats, know what the shiftJIS encoding ( http://rikai.com/library/kanjitables/kanji_codes.sjis.shtml ) range is, know what text generally looks like (how many sentences have words with more than about 8 characters between a space, most languages don't routinely use thousands of radically different characters), know what compression looks like (in text it typically makes things start normally but then go to gibberish if it is a sliding window compressions https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf or randomly miss bits if it is huffman or some substitution/lookup approach). This on top of things I might have done elsewhere to give me an idea of what I might be looking at (if I have already found at least what I presume is music, levels, graphics, code and video then I know to adjust my sights to find text type patterns).

Perhaps another analogy. I have a nice collection of hammers. Give me a bit of sheet metal, something to bash it against and a hammer and I can make whatever you like. Someone that has a sheet metal bender will likely make a far nicer finish bend in a fraction of the time though. There are dozens of other tools to manipulate sheet metal as well and they all do far more, far more precisely and far more quickly but rarely as versatile as hammer and infinite time. The hex editor is the hammer in this scenario -- very crude but in some ways less limiting than tools designed for specific common scenarios.

What you then want to learn is ROM hacking, how files tend to get made by programmers in general and for the system you are looking at (you are probably not going to find yourself editing some kind of standalone nosql database for a NES game, might do for a modern MMORPG though).
We have guides to ROM hacking
https://gbatemp.net/threads/gbatemp-rom-hacking-documentation-project-new-2016-edition-out.73394/
http://www.romhacking.net/start/
http://wiki.xentax.com/index.php/Game_File_Format_Central is not really ROM hacking but one of many examples of a library of game formats you can get some idea of things previously seen for.

Short version. There is no hacking skill or programming skill that hackers would call hex editing. Much like being able to make a word processor absolutely sing does not mean you can inherently write a novel with nice characters, plot and pacing then a hex editor is to hacking what the word processor is to novel writing. In this case it sounds like you want to learn ROM hacking.
 
  • Like
Reactions: crazy_p
D

Deleted User

Guest
I have only passing knowledge of this since nobody has enough years in their life to learn everything and a Jack of All Trades is a Master of None.
So I've chosen the Fields of Study that I am a Master of and delegated the remainder to able-bodied individuals who are their Masters.

That said, the easiest way to learn Programming would be through a Visual Language, and I do understand and use one in my AEC Design Work, which is Grasshopper.
The thing to note with all Programming Languages is that it is a Consumer Product, created to make a Profit; as such, it is in their Financial interest to hook you with an Easy-to-Learn UI and Workflow.

That is why Programmes have become more User-Friendly with every new Generation.

Back to your Question, there are plenty of Free Online Learning that one can try, but sometimes it's best to find one with Reputation and the ability to attain Certification.
Harvard University provides both options, including in your Field of interest, through their HarvardX Portal.

Most Courses are Free to Learn and, if you have both the need for Certification to land that Job and extra cash to spare, you can pay to take their Test and receive it.
It is one of the most sensible ways to attain Professional Certification I've seen Online, especially during this Pandemic.
 

TheN00b21

Well-Known Member
OP
Newcomer
Joined
Dec 6, 2020
Messages
52
Trophies
0
Age
23
Location
Somewhere
XP
292
Country
United States
Hex editing is not a thing, a skill, or ability beyond the obvious "hex is 0 through F, type it like you would in a word processor but using that instead" and less obvious "hex editors tend to be split into three sections -- left typically is the location in the file/memory/hard drive, middle is the data represented as hex, and right is typically some kind of text decode that can be customised to be a different encoding or a custom one in some cases as you desire). The word processor analogy also often holds quite well -- hex editors tend to feature niceties like overwrite/insert, go to location, simple common operations (I would say most boolean and bitwise operations are much like "make bold" or change case in a word processor).
Skilled ROM hackers might use a hex editor as a least worst option to make a small change, or initially analyse a file to see if it looks similar to things they have seen before*, has certain tells**, has patterns worth exploring or similar. They might also realise a change could be in any number of places but is in a fixed location relative to something easy to find, or if there is a wide selection of options might direct a user to make a change with one rather than making however many thousands of patches that the combinations of such changes might make (think put this number in here to make the pokemon of your choice appear, and put this other number to put this in your inventory -- if there are 500 pokemon and 500 items that is 250000 total combinations before you even consider someone might only want 1 potion rather than a full stack, far easier to say "this location and this number from this list, and this location and this list has numbers"). As the hex quite literally represents the contents of the file then hackers might share dumps and screenshots for other hackers to contemplate but they know the other hacker will be able to filter out the useless information.

*this does include visually. I could look to see if what I think could be a section of pointers makes sense given the rest of the file (assume pointers for text are at the end of lines/sections and comparing section count with possible pointer width) but if I change the window width/section width and suddenly end up with a line of 0000s going down the screen or a diagonal line/repeating pattern on diagonals then I am probably onto something.

**I know if a DS file starts with 10, 11 or 40 then I should look into whether it is compressed, can also compress the file (compressed files don't compress much, uncompressed files might well do), I also know many DS files start with "magic stamps" as well as the extensions the files will tend to have, I know GBA pointers tend to have 08 as the first number as that is where most GBA games see the ROM in memory but also know where that fails, I know the DS is fairly weak as these things go so tends to have certain formats, know what the shiftJIS encoding ( http://rikai.com/library/kanjitables/kanji_codes.sjis.shtml ) range is, know what text generally looks like (how many sentences have words with more than about 8 characters between a space, most languages don't routinely use thousands of radically different characters), know what compression looks like (in text it typically makes things start normally but then go to gibberish if it is a sliding window compressions https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf or randomly miss bits if it is huffman or some substitution/lookup approach). This on top of things I might have done elsewhere to give me an idea of what I might be looking at (if I have already found at least what I presume is music, levels, graphics, code and video then I know to adjust my sights to find text type patterns).

Perhaps another analogy. I have a nice collection of hammers. Give me a bit of sheet metal, something to bash it against and a hammer and I can make whatever you like. Someone that has a sheet metal bender will likely make a far nicer finish bend in a fraction of the time though. There are dozens of other tools to manipulate sheet metal as well and they all do far more, far more precisely and far more quickly but rarely as versatile as hammer and infinite time. The hex editor is the hammer in this scenario -- very crude but in some ways less limiting than tools designed for specific common scenarios.

What you then want to learn is ROM hacking, how files tend to get made by programmers in general and for the system you are looking at (you are probably not going to find yourself editing some kind of standalone nosql database for a NES game, might do for a modern MMORPG though).
We have guides to ROM hacking
https://gbatemp.net/threads/gbatemp-rom-hacking-documentation-project-new-2016-edition-out.73394/
http://www.romhacking.net/start/
http://wiki.xentax.com/index.php/Game_File_Format_Central is not really ROM hacking but one of many examples of a library of game formats you can get some idea of things previously seen for.

Short version. There is no hacking skill or programming skill that hackers would call hex editing. Much like being able to make a word processor absolutely sing does not mean you can inherently write a novel with nice characters, plot and pacing then a hex editor is to hacking what the word processor is to novel writing. In this case it sounds like you want to learn ROM hacking.
You are a literal king in this explanation, you summed up everything I needed to know. This would explain all of my problems. Before, I was a bit confused at how people actually "Read or Decipher" hex. But if I'm understanding correctly, most people when they look at a compressed file, they're are just trying find similarities to other types of compressions and file types. Thanks so much for the big explanation and I'll follow and learn more in your guide.
 

JaapDaniels

Well-Known Member
Member
Joined
Apr 22, 2012
Messages
1,191
Trophies
1
Age
40
Website
github.com
XP
2,426
Country
Netherlands
So if you don't know, I'm pretty stupid in some areas. And hex viewing is definitely one of those areas. When I have a question about a complicated software or program, they always just say "JuSt FiGuRe It OuT iN hEx". It's been bugging me for a year now and just wanted to ask the people that are geniuses in this area how they learn or did it with what prior knowledge they had. If you could give a link on where you did it or something else, that would be great. If you don't know hex then: 1. Welcome to the club 2. Feel free to talk about how you learned a programming language like C or something else 3. If you don't have any other knowledge about programming or hex, just talk about something cool in technology you did this year (Homebrew, Modding, Game Creation, etc.)
for as much as i used it, it depends on the CPU used what hex code means, if there's knowledge about that part you could reverse code to assembly code.
reversing that code is not always straight forward as explained it could be compressed or encrypted.
i never touched hexeditor for anything that complex, i used it in starting period of windows programs, security wasn't that much a thing there yet.
and i've used it at snes era wich mostly wasn't secure either, just needed an interpretation of 65816 cpu.
 

FAST6191

Techromancer
Editorial Team
Joined
Nov 21, 2005
Messages
36,798
Trophies
3
XP
28,321
Country
United Kingdom
for as much as i used it, it depends on the CPU used what hex code means
It is not even that.

It is is context dependent upon the program that is running and reading it.
Certainly if you are playing with section of code intended for the CPU to interpret as instructions then it is very much dependent upon not only the CPU but how the original developers set up the system it is running on, however not all files we look at as ROM hackers are aimed at the CPU. A 00 somewhere in the code could mean any number of things, including nothing as it was just put there to make the next section of code start a place more convenient for the CPU.
It is rarely done but you can also have one piece of code mean many things.

I don't think the OP needs to get into hardcore areas of data representation and information theory but there are oddities to be aware of, and I would say boolean logic and logical operations (shift, rotate, ceiling, floor, abs and other things most usually meet first in a spreadsheet) are a must, as is awareness that sometimes devs will pack in data to unused portions of numbers. Loops and the idea of data types (signed, unsigned, text, floating point, fixed point...) at least as far as the basic introduction to programming for whatever language you are looking at cares to give.
For example the NARC archive format Nintendo provides for use on the DS (and cousins in various other Nintendo consoles) has numbers that note where things are found within the file (pointers being the more accurate term and thing you will meet very soon into pulling apart files or aspects of low level programming). However as they are big numbers (the DS does not have time to be fiddling with 2 bits at a time when storage is so cheap so minimums are more like 8 bits or 16 or even 32 at times. As nothing on the DS is likely to need a file 2^32 bits in size (several gigabytes) then the devs of the format opted to have the first bit be flipped high if the file was actually in a subdirectory. If you read the number normally then it will appear massive, get rid of the high bit at the start and all of a sudden it makes sense. I have also encountered files where the pointers were shifted -- the NSMBD format (3d models on the DS) has an obvious common case of this but Touch Detective is probably the better example in an archive format.

Before, I was a bit confused at how people actually "Read or Decipher" hex. But if I'm understanding correctly, most people when they look at a compressed file, they're are just trying find similarities to other types of compressions and file types.
To an extent. They will also be looking for what they expect to see in a given file type.
For instance an archive format is not much good if you don't have a way to locate the files within it, and most archive formats will have a name for the files they house (though as it is not necessary for the console then not all will).
The average game console does not operate in the petabytes region, indeed far older consoles might not even get into megabytes of space. To that end you are not expecting file formats to be built to handle that kind of large size and thus you can constrain yourself a bit. Likewise if you know the hardware -- when they say a game console is 16 bit it means it prefers to do maths on numbers 16 bit in size or less. While it can do more that takes far more effort so it will only be done when absolutely necessary.
Likewise most music files on something like the DS will also need a bunch of data somewhere, usually somewhere easy to find, to note what volume to play things at, what the data actually is (sample rate, bit depth, data type...) so I would expect to find something handling that if I am looking at an unknown music format ( https://gbatemp.net/threads/the-various-audio-formats-of-the-ds.305167/ ). Graphics then the hardware itself on the GBA/DS... and actually most consoles has formats it expects to see so most devs don't make the game do extra work at runtime and will instead leave it in formats that work well with the hardware it is going into.

So yeah I usually look at things asking "if I were a developer doing this then what do I need and what I would do?", and "what are the limitations of the system and formats it expects or information it needs?". If you couple that by knowing the hardware, knowing at least the basics of how whatever sort of data you think you are looking at (graphics, audio, audio can also be wave type and more midi/tracker style which has its own requirements, video, levels, text...) it is generally stored and handled on computers (including compression https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf ) and maybe how games themselves work ( https://docs.google.com/document/d/1iNSQIyNpVGHeak6isbP6AHdHD50gs8MNXF1GCf08efg/pub?embedded=true ) then you tend to start getting places reasonably quickly. Don't overlook file names, file sizes (a 200 byte file is not going to be 50 minutes of full orchestral recording for instance), directory names, file extensions and more as well (knowing what something is not can be quite value even if you don't know what it is). Similarly if the game has credits to rad/bink, cri middleware, unreal engine and other such things in the title then do pay attention to that as well. Developers are human and also lazy by design (doing work is hard, if someone else has already done it before you then you can buy that in and get on with other things).
 
  • Like
Reactions: TheN00b21

Deleted member 545975

Well-Known Member
Member
Joined
Dec 16, 2020
Messages
102
Trophies
0
Location
x
XP
892
Country
United States
Hello, and good day, TheN00b21! To work on, in general, I look for three:
File format specifications, if any, and work around it, and, if featurd in, header data, to know about how data may be structured in: For example, NES ROMs may use the iNES file format specication, as the web page that the next hyperlink leads at specifies:
https://wiki.nesdev.org/w/index.php/INES

System specifications, if it is software or otherwise data tagetting a system, to know more about how the system parses it; how and where it hands program flow to loaded content, its system input output ports, if any, memory structure, and otherwise; for the NES, I use the system specifications offered in the web page that the next hyperlink leads to:
http://problemkaputt.de/everynes.htm

Processor specifications: I read about registers it features by, its instruction set, and other paradigms about supported data, opcodes, and operands formats, for it to parse: For the Nintendo Entertainment System, its main microprocessor is a RP2a03 for NTSC systems, and RP2a07 for PAL systems: Both uses a modified 6502 processor core without its binary-coded decimal feature; it otherwise is similar to; the next hyperlinks, leads at documents that contains information about one or more of the next three, its instruction set, associated byte in hexadecimal notation, format, and paradigms related to:
An instruction set reference:
https://web.archive.org/web/20210803072316/http://www.obelisk.me.uk/6502/reference.html
Its addressing modes:
https://web.archive.org/web/20180418015345/http://www.emulator101.com/6502-addressing-modes.html
A web page offering a datasheet, and other documents for:
https://web.archive.org/web/20090113222331/http://mdfs.net/Docs/Comp/6502/
 

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
  • No one is chatting at the moment.
    SylverReZ @ SylverReZ: https://www.youtube.com/watch?v=pnRVIC7kS4s