GameCube Game Patching - Mega Man Anniversary CollectionJun 30, 2019
Hey there. I know, I haven't made a new post in quite a while, its not like I haven't done anything the past few months, I just couldn't really find the will to write anything down, but honestly this latest project of mine deserves a post for sure.
In this post I will detail my process on how my latest patch for the gamecube version of megaman anniversary collection came to be and a bit about how it works, which you can find here:
Yes, it is indeed another patch for megaman stuff! I don't know why I am doing so many of these lately, first all my PAL audio patches and now I had to move on to gamecube, I guess it just comes with me working so' much on nintendont that I also wanted to fix up these issues.
First off, we have to look at nintendont way back at its early stages in 2014, when I noticed the music in it played very broken, which I fixed in this commit:
Fun fact about the function I fixed, it actually became a mistake of my very first version of my newly developed patch to sound broken on a real gamecube but working fine in dolphin and in nintendont, I'll get back to that later
At this time after fixing the audio, I also realized just how terrible its quality was in the NES ports of megaman 1-6, but I did not do anything about it at the time, still it for sure got into my head. Oh and of course, I also noticed that the controls were backwards and I did release some very hacky patch back then that just globally reversed the A and B buttons for everything, sure that meant the main menu was messed up but at least ingame it made things far more playable:
Now fast forward all the way to 2016 when I first looked at some things regarding the audio, I did just try re-encode the music files that are streamed from disc straight up and tested that with megaman 2, because I wondered if the audio was so bad because of the codec selected or if it was just their recording, and well I clearly proved it was just their recording, see this video I released at the time:
This project did get abandoned by me shortly after this demo video because I realized just how broken the whole concept of playing streamed audio was, looping was pretty much impossible and also sound effects were actually recorded at even LOWER quality, only 18kHz vs the 32kHz music has, so even a re-encode did not help a whole lot here sadly and also audio balancing became a problem, but during all that I DID notice something super interesting because of actually another project I to this day haven't released that I may get back to at some point because I think its really cool too...
Anyways, what I noticed was there was parts of the original ROM image of megaman 2 in the game files even though its not emulated, that's as much notice as I took of it because back in 2016 I did not know a whole lot about NES ROMs and emulation, but that changed a whole lot.
So then in 2017 I started writing my own NES emulator and I still have been working on it from time to time all the way up till now:
This project really helped me learn a ton about not just emulation but optimization and hacking, it has been really cool just seeing a NES game being emulated by something I somehow wrote up myself from scratch, no matter how often I run something.
With that, we now come to the start of this month, where I just had one of those moments where I suddenly remembered, oh yea there was this weird thing I found about ROMs in the gamecube anniversary collection, and I went to investigate a bit more, and well my findings were rather incredible...
EVERY single NES game you load actually loads a full NES ROM into gamecube memory on bootup without using it! This ROM is in japanese and heavily modified, they replaced all graphics with some form of addresses, I assume maybe at some point they did emulate the games and showed the graphics natively on the system using these addresses but changed their mind later in development, but never removed it from their big archive file. You see, every NES port loads one big archive file with many files inside, so it was probably an early leftover or something that I am now extremely lucky to have!
When I tried to run these files in a regular NES emu, they most certainly look weird, here is a comparison of the original ROM:
and what it looks like from the extracted archive:
You can still very clearly make out what everything is supposed to be, this is because while the graphics were replaced with addresses, only 3 quarters of each tile was actually replaced, they left the last quarter intact with the original game data, so you can still make out what everything is. Now the incredible part is, ONLY the graphics are messed up, ALL of the 6 extracted files play and sound perfectly fine, the code itself is practically untouched!
This finally leads me into my crazy idea, with the game loading a functional ROM on boot as well as me writing a very portable NES emu, what if I were to try strip down my emulator to only do audio and emulate the sound while running the port as normal?
This immediately was followed by the question of how does one add additional code to an already fully compiled game there's no source code for without requiring some rebuilding of the ISO file or something like that.
Miraculously this game comes with a so called .elf executable on disc. To explain the significance of this, normally gamecube games come in .dol execuables only, those have no function or variable names embedded, so when you try and reverse engineer it, you only see pure machine code without anything helping you along with names they had in the original source. In this case, this .elf executable though is the exact version that the .dol executable is based on with ALL FUNCTION AND VARIABLE NAMES! This really makes things so much easier, because it was immediately clear what I had to find out and test without having to spend a incredibly long amount of time going through each function and giving it some names I come up with on the fly on what they could be.
Now I was on the hunt for unused space in the data section of this executable, and very quickly I found one particular function "BinkLogoAddress" which is part of the games video player that would hand over so called "LogoData" that takes up 14.5KB of data, and this function seemingly is never called! So this gave me a pretty good start but it turned out very quickly that it was not nearly enough for my NES emulator and all the code on the side, so I went on the hunt again and again as part of the video player I found several parts that went unused, you see to decode an image, it uses a big table full of the needed numbers to decode it back into an image you can see, but it also included additional tables for decode methods that in fact were never used, I found 2 tables that both take 4KB very close to each other, so in the executable memory it was basically laid out like this:
table 1 - 4KB of used table
table 2 - 4KB of unused table
table 3 - 4KB of used table
table 4 - 4KB of unused table
In this case what I did was copy table 3 into the place of table 2 and wrote some additional patches for the functions using that table to now point to the position where that unused table 2 is like this:
Which gives me a little over 8KB of additional free space! What I mean by a little over is that there were some "alignment" blocks in between that were not needed so I removed those to save a couple extra bytes.
So great, now I have a 14.5KB block and a 8.125KB block of data free, and they are at different places in memory, so I cant just get all my code into that space easily... Now I turned to another project of mine:
I wrote that to compress executables using the LZMA2 algorithm you find in for example .7z archives and include a decompressor so you can make homebrew executables smaller, and I for example used this project:
to go from a 5.48MB executable to only be 452KB executable for example, I was really hoping it would also be useful on a small scale.
This turned out to be VERY tricky, I wanted to fit the decompressor part into those 8.1KB which was a pretty interesting task to be honest.
My first try at stripping it down to just decompress one test binary and jump to it turned out to be about 10KB which was far too big, so I turned to compiler optimizations first and made it optimize for size. This still was slightly too big, but I noticed that it added some additional calls to functions now it imported from other libraries, this is a info page from the final file I released:
Idx Name Size VMA LMA File off Algn
0 .code 00002080 800bd220 800bd220 00000054 2**2
CONTENTS, ALLOC, LOAD, CODE
1 .text.__ashldi3 00000040 800bf2a0 800bf2a0 000020d4 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .text.memcmp 00000094 800bf2e0 800bf2e0 00002114 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
3 .text.memmove 00000144 800bf374 800bf374 000021a8 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
At this point, it was SUPER CLOSE to fitting but not quite, what I ended up doing is a little bit of a hack I suppose, when decompressing LZMA2 data with the library I used, it also checks for checksums on the data and blocks using CRC32, so I just removed any and all references to these checks and just made them to be always "true" in code as bypass. I mean in the end they should never fail anyways cause this data is embedded permanently, and if it was to fail, it would crash either way, with those hacked in changes I went down to 8KB in the end, hooray!
Now I had a decompressor that was able to put my emulator and needed side code easily into the 14.5KB, but that brought up another question, where to extract my code to? Well as it turns out, the video player actually has about 41.5KB of work RAM it uses that is at a fixed location in the executable when its decoding video at this location:
and well since the NES games never actually play back any videos, this meant it was the perfect spot to extract it into when playing them! So my decompressor now either decompresses the emulator when it sees a NES title is loaded, or clears the whole video space again back to 0 if any other module is loaded:
This of course brings me into my big hooks, the way I attach my new code onto the existing games code.
As great as it is to have some new code I fit into the executable, nothing was ever calling it yet, I had to actually make existing functions jump to my new code at the right time to make anything happen. Every loading screen you see basically loads a new "module" into memory, with the main menu and all games all being their own modules. So all I did as the big hook for the decompressor was hook into the start of the MODULE_Init function:
This patch will effectively jump to the decompressor every time a new module gets loaded, so we can immediately add new patches from there if needed. Also the 2 patches you see below that in that source code are for getting some space in audio RAM for our new NES audio to actually fit as well as some small function to skip over the whole game logo display on bootup when start is held down on boot.
With all this, I now have an automated way to load up lots of code on bootup of any NES title from where I can do whatever I want, I wont go into full detail of all my issues and developments that went on at this point because this process took me several weeks to get to where it is now, it would just get too long.
After extracting the emulator with its code, the decompressor calls a initialization function of that new code first:
This just takes the current game ID, the audio RAM location and applies some new patches for when its ready to load the module, transferring the NES ROM and unloading the module. The first thing that happens after that is of course it calling that freshly patched function of being ready to load the module, as this happens right after the decompressor is done:
This thing does a LOT. At first, it finishes up loading the module (which is the part I overwrote), and then applies patches such as reversing the A and B controls properly just for jump and shoot, as well as getting ready to patch the most important addresses of this module yet, the function thats responsible for taking whatever audio request the game has and translating it to some filename the game then plays back from disc. Basically what it will now do instead is to save this request ID coming from the game and save it onto a new list for our emu later, and then exit the function, bypassing all the disc play stuff. This way, there never is any low quality audio played from disc but we still get to keep what it wanted to play, this request will get answered with the next big function part;
The transfer of the NES ROM into memory:
What I had to do here is analyze each game in a NES emulator and find out where in the ROM the audio driver is and how the game communicates with it, luckily in all cases there is always a part of the ROM that updates the audio and checks if there is a new audio request, and if there is it tells the audio driver that, all in a single bit of code without it being split apart or anything. So what I do here is tell my NES emulator hey, here at this address in the ROM is the audio driver, load that up as current CPU code location, and hey, here's the function that does the updates and checks in memory. What I then do is every frame in a separate thread from the main game I just let my emulated CPU jump to the start of the audio update function, and once its done, I just set the emulated CPU back to sleep, and on the next frame start the update start again and so on, so it never executes anything else from the game ROM.
On top of initializing all the emulator things, it also sets up my method of then actually playing back the audio my emulator spits out, obviously as great as it is to have the emulator code running and updating every frame, I still have to somehow push this audio into the gamecube audio driver, for this we have this next part of this transfer function:
Here I just call the standard gamecube API for audio called AX to create a new "voice", think of it as an audio source, and I tell it to just loop this new audio buffer over and over again, I also tell it that every time the audio driver updates to call this function in my code after that point:
All that does is to resume my emulator thread every time it notices half of the looping audio buffer was played, so it refills it with new audio from my emu, what this effectively means is it will never have any skips in the audio because it results in this simple logic:
audio buffer 1st half fully played -> updates audio buffer 1st half with new data -> audio buffer 2nd half fully played -> updates audio buffer 2nd half with new data -> cycle repeats from 1st half fully played (which was updated before etc)
To get back to my patch from 2014 of the whole audio playing wrong, a little fun fact about this update process was that I actually first messed it up because I forgot to write the buffer down into main memory from cache before transferring it which worked fine in dolphin (because it has no cache like that) and nintendont (because it forces the cache to main memory automatically) so the sound on a real gamecube was very broken and crackly, that one took me a bit to figure out, and in the end all I had to add was this line before copying the audio to the audio memory to fix it:
So with all this to summarize, I now take the audio requests from the game, forward them to the original NES audio driver, thankfully most request IDs are the exact same as they were in the original, I only had to patch a few, and it plays back the audio in a constantly repeating buffer with those new requests within milliseconds of those requests coming in having very little delay, all this results in a patch that honestly is incredibly good in my opinion, I never thought it would sound even half as good as this.
During this project I ran into many issues, especially megaman 6 which had me rework my emulator because I ran out of memory and because I had to make a stupid amount of patches for it:
Which is a result of it sounding so different to the original NES game and I had to do my best to restore it to be as close as possible to work with the original NES driver, but in the end I somehow pulled everything off pretty well, to now have a patch that runs my emulator in the background of this port, taking the original audio requests and seamlessly playing them back as if it was designed that way thanks to some unused japanese roms left over in the games archives!
I guess just at the end of this writeup let me demonstrate how my result sounds by playing through airmans stage in megaman 2 first in an unpatched, original ISO file so it can really burn in at how bad it really sounded to start out with, which took me many attempts by the way thanks to the backwards controls:
and then now with the patch applied and the audio emulated through my own nes emulator instead:
Well, I hope you enjoyed reading through my latest bit of rambling, sorry for not making it any more detailed, it was just too much to really fit into a post without more planning beforehand I guess, I just wrote this in one go because I felt like it was a pretty cool project to talk a little bit about.
So again if you are interested for some reason in trying it out yourself, just check out the project on github and read the short README at the bottom for details on how to apply it, I tested in in dolphin, nintendont and on a real gamecube so let me know if you find any issues:
NES Patching - Mega Man 6 PAL MusicDec 8, 2018
While Mega Man 1 to 5 all got some form of PAL versions, though the first 2 were not all that great, see my blog post about patching those, Mega Man 6 was one of the last nes titles ever released, and so it never even got any PAL port.
This did annoy me just enough to now take a look at it and see if maybe I could at least adjust the music to make that acceptable on a PAL console, gameplay wise none of the others were sped up so the slower gameplay really wouldnt be noticable to me anyways.
To demonstrate what happens when just running the game without any adjustment, have a listen to the games intro as intended, in NTSC mode:
And now when running this exactly as is but in PAL mode instead:
As you can hear, both pitch are too low and speed is too slow as well.
First off, I looked at where the game does its pitch adjustment, cause due to slower PAL clocks, all the sounds will sound pitched down when not doing any calculations. For that I just set a breakpoint to the frequency address of the first instrument, which should lead me into the general right direction:
Thats the function responsible for this, and also what you see on the side there, 1A:, is the rom bank of the music driver, this can be seen by just scrolling up and down its code. Now while I did not really see the exact read from where it gets its pitches from, I did notice several references to some table of numbers around the $8960 region which get stored in variables very close to the ones that get used to call this frequency set function and then modified in some ways further, so at that point I actually had the hopes of the mega man 5 music driver being very similar as it released not too much earlier, so I scrolled to this table in mega man 6 in my hex editor:
and just searched for this pattern in the us version of mega man 5 and wouldnt you know it, its in there exactly the same:
Thats a great start! Now all I had to do was search for that bit before the table in the PAL version of mega man 5 and see if its any different, and guess what, it totally is:
So being pretty hopeful I just copied that PAL table right into mega man 6, fired up the game to take a first listen and...!
YEAAAAA, something isnt quite right here is it? The pitch most certainly changed, but this is not at all what I expected.
This clearly means that somewhere in code there actually is some important change, so I fired up both us and pal versions of mega man 5 and started scrolling through their music driver code for a good 10 minutes until I found this one single line near the end:
See how that says ADC #$07? Well, in the pal version this line is actually different!
A difference by 1, that probably explains why all notes were completely off, so I tracked down that line in mega man 6 and replaced the 7 with a 6, and this is how it then sounds:
now THATS what I want to hear I'm really happy that I was able to just stick to original frequency values made by capcom back in the day rather than calculating my own ones here, it sounds really good I find.
Next up it is time to look at music speed, and for this, I did first look into mega man 5 again, and this is their "speed code":
So let me explain whats done here, first off the variable $C0 is used a bunch by the game, which is why first it takes 6 bits from it and stores them into a "temporary" variable of $C3 which is used by the music driver as temporary as well, so good choice there, then they add 0x4 onto $C0 and then AND it with 0xC, this gives them a total of 4 potential states here, those are 0x0, 0x4, 0x8 and 0xC, after doing so they combine the previous value in $C0 with the now newly added value and store it back into $C0, so on the next update call, they can add to that number again, thus keeping track of how often it ran. Now the next bit of code basically checks if the value is 0x0, and if it is, then it calls the update function $8084 as a function, and right below that call obviously is address $8084 as well, meaning it will basically get executed twice in that cycle, the idea of that is to speed up the music by just calling the update function twice every so often.
Now, while this sounds ok, it actually results in music that is 4% too fast! To explain why, basically for every 4 calls here, the music update function gets called 5 times because of that one double call, PAL runs at 50hz, that means every second, this function gets called 62.5 times (50/4*5). The thing is, NTSC runs at 60hz, not 62.5, so thats where that wrong 4% comes from. So, instead of taking the mega man 5 pal code for mega man 6, I instead just copied my code design from castlevania, see that blog post for how that works, all I did is swap out the unused variable, in castlevania, $F0 was unused, in mega man 6, $DE now goes unused, so I replace the jump in the music driver from 806C:
to now go way up into a small unused area at $8A2C:
Then took that blank memory:
and place in my bit of code into that spot:
Translating into this bit of code now being in place of just the music update function:
It works exactly like it does in castlevania too, just $F0 is swapped with $DE.
So, this now ingame sounds like this:
So, the audio now sounds great, but as you can see, the intro visuals are way too slow, also the same goes for the games credits as well, basically this now just involved me looking for variables that count down on frames, seeing where those variables were originally set, and then reducing them. I wont go into more detail than that because there were a LOT of timing variables to adjust overall for intro and credits, I changed out (if I did not miscount) 32 variables to make those sync up properly to the music. As I said earlier, I did nothing about actual gameplay since that also wasnt done in any of the other games and honestly with how brutal megaman is, having everything be a bit slower isnt a bad thing
Another small thing I changed is the initial screen:
search for its content in the game in a hex editor:
and swap out those characters as well as remove the first line from being drawn by moving the memory pointer in the game to just point to the second line immediately.
With those music pitch and timing changes, small screen edit and intro and credits adjustment I released that as v1 of my patch because it seemed to play all just fine in fceux which I used for breakpoints and stuff, so I rushed through the game in it with invincibility on cause and all seemed good, and on my real nes I hopped into every stage shortly and it all worked fine, so I thought it was all good...
Well, later that day, I finished the game properly on my real nes, and in one particular room in one stage, things behaved... rather glitchy. Let me show you what I mean:
As you can see, things move around pretty crazy and not at all as it should, this particular effect is done using the scanline interrupt feature of the mmc3 expansion chip that the game uses. Heres just a quick picture showing where those interrupts and setups happen:
Once position 2 hits, it calculates how far down that "boat" or whatever it is should be in the water, so it changes the currently drawn screen image to just black and also sets the next interrupt position to the start of where the boat should be drawn. Now this is where this first issue happens, how the MMC3 works is it basically has a counter that counts down once a scanline is drawn, and once it hits 0, it will trigger an interrupt and on the next scanline it will load whatever value the game wants to have it triggering next. The game actually has a wait function it uses at positions 2 and 4 once an interrupt hits, so it wont just set the next thing to draw in the middle of the current scanline but right at the edge. The PAL CPU is just a bit slower than the NTSC one though, so this particular wait code in the game actually sets the next interrupt position too late sometimes, meaning the MMC3 already reloaded the previous scanline value again, which causes that heavy flickering on that particular frame, because it draws the boat and water at completely wrong positions. To fix this, all that has to be changed is this wait call at position 2:
$C163 basically just counts down as many cycles as is given by LDX, in this case, thats 4 cycles, so all I did here is reduce this down to the lowest value I found works on hardware:
And for position 4 the wait originally was 3, there I also changed it to 2 and now with those 2 lines slightly altered, this room looks like this on console:
Now thats SO much better
If you were really observant though, you may have noticed that one some few frames, the water was drawn lower than it should've been, this is due to the music driver updates, normally the game updates everything during/after vsync happened, because once the picture settings are in for a frame, they dont change, this room is special though as it has multiple scanline interrupts, so the music update is actually happening between position 3 and 4.
This means, that every so often when the "boat" thingy is really close to the water, the music driver got called twice to get the speed up to ntsc level AND some note was loaded/sound effect played it can happen that the cpu takes longer to update all that, and the interrupt gets handled after the graphics chip already drew those extra lines you see. So really that little bit of glitching left is not something I can easily fix, but honestly I dont think its an issue at all, especially compared to it being not patched at all as you saw before.
So that is it for now, the patch is available on my github as always:
And because I did not show the fixed intro speed earlier, heres the game intro followed by me playing from my console:
With this patch I guess now after 25 years it is finally possible to play all 6 mega man games properly on a pal console, so thats cool
If you made it this far, thanks for reading this yet again pretty long technical blog post.
NES Patching - Castlevania PAL MusicNov 2, 2018
In the past, I've made a patch for the PAL versions of megaman 1 and 2 that fixed their too slow music, see that blog post for details, and somebody then asked me to take a look at castlevania, so now I did.
While there was a fully proper, well documented disassembly of the music driver of megaman 1 and 2, I did not find something quite like that for castlevania after searching, but I did find a very basic disassembly with a couple functions named at least, most importantly, a function called UpdateMusic.
As you can see, theres no further documentation or comments, but hey, at least I now knew what to look for, as it also pointed out where it was called from.
With the PAL versions of megaman 1 and 2, there were slight differences from the NTSC version when it comes to the music so there I could not just look for those addresses, but in castlevania, they did not adjust music pitch or speed, so when I looked for this particular call in hex, 20 8A 83 in the PAL rom in my hex editor, it came up immediately:
So there was no further searching required which is nice, and I now had an idea of the variables involved too!
Now to give you a demonstration of how bad it sounds like when a pal title does not have speed or pitch adjusted, first have a listen to the intended ntsc speed:
And now, for how the pal version sounded like originally:
Previously with megaman, songs had a speed value attached to them which I modified, but with castlevania I looked at its addresses while it was running and there it already seems to be all based on frame values counting down, the issue with that of course is, the pal version runs at 50 frames per second, the ntsc version at 60, thats why it plays back slower. So with no easy song speed value to change out this time, with a frame counter I would have had to adjust each note timing and it would end up not really syncing up anymore, so I instead wanted to try out a method I saw in another nes music driver in the past that was already adjusted for proper pal playback, which just involves calling the UpdateMusic function more often. For this, I need to add some extra code of course, so that meant to see if there was some leftover space, and luckily, there is a massive area near the existing music code:
Now when it comes to what to put in place instead I started out very simple, I just called the UpdateMusic function twice like this:
With that small change in place, it now all sounds like this ingame:
Exactly as hoped, as you can hear everything plays at twice the speed now, so thats some good progress. Now to make this sync up as it should, I needed some counter variable so I know how many frames have passed so far, for that I looked into the lower memory of the game for a while and noticed that address $F0 seems to be unused:
To confirm that, I did set a breakpoint for that address, which basically would interrupt the game as soon as there was a write to it, but it never triggered, not during the intro or gameplay demo, I also finished the game with it hooked and did not see it accessed once, and in that disassembly I mentioned earlier I also so no reference of it, so it was the perfect candidate for this. So, with an address near the start of the memory that I could use, I now wrote up the following bit of code instead of just calling the UpdateMusic function twice:
To explain what this does, I first subtract 1 from whatever number is in address $F0, and I then see if the first 3 bits of $F0 are 0 by using the AND instruction. Because the game never initializes this value I wanted to make sure it checks only as few bits as possible, if I were to just see if the number itself was 0 then at the beginning of the game if there was a really large number in there already from boot then it would take a couple of seconds to sync up the timer properly. With this method though, it will sync up within the first few milliseconds of boot regardless of what is in there, which is quick enough to never run into any issues. Now when it hits 0, it reloads $F0 with a value of 5, and then executes the UpdateMusic function twice, if it was not 0, it only executes UpdateMusic once.
The idea behind this is rather simple, because I have to make up 10 calls a second (50 calls a second in pal vs 60 calls a second in ntsc), after the function was called once for 4 frames, I then call it twice on the next frame and repeat that sequence over and over. This simple method means for 5 frames it gets called 6 times, which of course after 50 frames means it called it 60 times, exactly the number it should be to be as fast as ntsc! While this of course means at times it advances the music more than others, it happens in such a short timespan of just milliseconds that it is unnoticable when listening to it.
To hear that for yourself, with this change in place we get the following result:
That is already so much better, the only thing left now is the pitch of the notes still being off, how this is done is that there is a function that gets a number for a note, and it takes that number and grabs the frequency to use from a list of frequencies:
$BC in this case already contains one half of that grabbed frequency and at the bottom you can see that getting stored into SQ1_HI, which is the upper half of the frequency register of the audio processor, so that is what you hear in the end, that frequency value in $BC was assigned right here:
You can see, it grabs some value from $878A and stores it into $BC as well as $BD, which is the other half of the frequency, and Y in this case was whatever note it wanted to translate into the frequency you hear, now to have a look at the values in $878A in a hex editor, I've marked the relevant bit:
I know that the things after that marked part are not relevant anymore as those follow a different structure so they are probably values for some other functions not related to frequencies, every frequency takes up 2 bytes here, so the first one being 0x6AE, and the last one being 0x38A, 12 frequency values in total. What I now have to do is make the notes play higher, that means in this case those frequency values have to be lower, thats how the audio processor works.
The question now of course is, how much lower do those values have to be? Well, having written a nes emu I know just the place to look for a reference of system clocks:
The one relevant here specifically is this one:
With 1.789773 MHz being the NTSC and 1.662607 MHz being the PAL clock, so now I just divide the NTSC by the PAL clock, resulting in a value of 1.076485904365854, which is the exact difference I now have to divide each of the values of the game by to get a properly adjusted output:
So the last step now was to just put those newly calculated values into the game like this:
With those tiny changes in place that took me less than 2 hours to come up with and implement which really makes me wonder why this wasnt done back in the day, I present to you the resulting ROM played on my actual PAL NES:
Because of that last recording it actually took me more than 2 hours just to get done with the blog post, thanks to my internal capture card suddenly showing a green screen when I wanted to record, after a reboot it wasnt detected anymore so I had to do a full shutdown, take out the card, make sure the pins are clean, replug it and THEN it was back to normal, probably over the years some contact got loose from my constant plugging in of different devices.
As always, this patch is available on my github:
It only changes a few bytes overall which is neat, anyways that is all for now, thanks for reading.
Some various GameBoy thingsOct 17, 2018
So recently I have just been playing around a bit more with my previous gameboy things, see my previous few blog posts for how I run code using my pokemon cart, just wanted to write a bit about some of those.
First up, I did end up implementing gameboy cart save dumping into my gameboy audio dumper:
It is a quite simple addition overall, the only strange thing is this first rather big code block:
That really just has to compare each value in the games header to determine how big the save is, they made it rather strange so it cant just be easily scaled up, oh well:
I dont have too many carts around that even have saving but interestingly enough I ended up running across one cart that seems to always corrupt the save when inserting that cart into a running gameboy color, and that is tetris dx.
For some odd reason, the first byte of the cart always gets set to 0 as I was able to see in a hex editor:
You see how it reads _ETRIS now? Well yeah as you can guess that should read TETRIS instead, and the game actually checks for that on boot, and if it is not present it completely wipes the save! Meaning that at any time if I would've booted this cart up in all my dump tests, I would've lost my original save from way back in the day!
I do believe that this may have something to do with the chip inside the cart which is a MBC1, the "1" indicating it being the first controller chip version, so I assume this particular bug is related to that, I do have a cart that has MBC2 and another with MBC5 chips in them, and on those there is no such save corruption at all. Also this does not seem to be related to some random cpu command but rather just the chip being powered up in a weird way, because a gameboy by default cannot read/write onto cart RAM until you unlock it by writing a magic value (0xA) into a specific cart register. And I did also dump the cart without writing into that register first and as you would expect, the save reads as 0xFF (no data).
Just because I really wanted to get into weird behavior I also booted the cart normal in another gameboy, then quickly ripped out the cart and plug it into my gameboy color, and guess what, now it just writes some random byte into a random address in the save instead!
That probably comes from some residual value still in the chip, how strange indeed.
Also as I mentioned, I have a cart with a MBC2 chip, now this chip save is a very strange one that does not use full 1 byte (8 bit) values per address, but it is 4 bit instead, so only half of a byte is actually usable as save. In my gameboy emulator:
I have this implemented by setting the upper part of the byte to 0xF because I figured that it probably would read as that on a real gameboy, because all other inaccessible registers on it always have bits set like that as well. And I am happy to see that indeed, a real MBC2 save looks exactly like that, so thats cool!
Looking into different audio pulses
My audio dumper does work by just turning the gameboy audio wave channel on and off, and I did already push the pulses as short as possible, zoomed in looking like this:
Now I did never see this corruption for myself before but now with this particular test version I have made, I was indeed able to see it for myself now:
In my receiver that converts these pulses into bytes I may try some other methods in the future to see the exact pulse size, at the moment I just use 6 points of reference around the pulse and 2 in the middle of it to and measure the value in between those to get its exact size, but I want to look into maybe doing something like taking something called RMS value instead, this basically takes all points I give it, in this case probably about 10 or more sample points in a row instead of just some on the side and in the middle, and gives me a single absolute volume value out, which may make this more accurate which would be nice. Also I probably should support more than the 44100hz sample rate, because as you can see from the previous image, at 96000hz the pulse gets much clearer and should measure much more accurately too. I just made it 44100 for now as I figured that is the easiest overall to capture, if anyone else is crazy enough to ever even test my dumper.
Some chinese multicart
Under the few carts I do have is actually this particular multicart called "Super 101 in 1" which has actually 16 games on it, the cart looks like this:
and when I wanted to dump it in my dumper, it always just did 32kb, this is because the first game on it is in fact only 32kb big, but when booted in an emu, you can clearly see that this is not actually just a game:
They basically just hacked the first game a bit to put this selection menu in, there is quite a bit of space after all in this game so it was probably the ideal target.
Now for me to properly dump this cart I had to figure out how it actually sets different games to boot, now this is where all the pirate mappers I implemented into my nes emu help:
This immediately gave me the idea that I should just set write breakpoints for normal cart chip registers and see if anything strange is written to them, and indeed for register 6000, there is:
This particular register on a real cart is not all that often used, on MBC1 chips, it just selects if the amount of banks RAM and ROM can have, and on MBC3 chips, it can be used to get the current time, on all the others like MBC2 and MBC5 it does not do anything.
This write happens right when you boot the cart, and then when selecting a game, it does more writes depending on the game, I did list those here:
From this, I was able to figure out exactly the logic behind this register by basically modifying my cart dumper and before dumping just writing in other values into it and seeing what happens, its quite technical:
That little bit took me quite a while to figure out, but now I know exactly how this cart works, how big it is (2MB) and also which games are exactly on it in which versions:
Overall, I made a good 15 dumps, all 1MB each, with all sorts of unusual values that the cart never actually writes into it itself just to make sure I exactly know what logic is going on, and also of course to build a proper 2MB dump of the cart itself instead of the initial 32kb dump. Was quite fun figuring out this odd cart I had laying around now for ages.
Also, I did end up implementing it for testing in my emulator and it works great, though I yet have to commit any changes to the emulator itself because there are various other things I still want to do to it before, in particular regarding the audio to make it more accurate. I did test my audio dumper on my own emulator and found out I dont actually handle the audio channel output how a real gameboy does exaclty, it is actually shifted up in a way it should not be, so I want to fix that up first.
I hope those different things were at least a bit interesting, thanks for reading.
Testing Officially Unused GameBoy Color FunctionOct 4, 2018
Now that I can execute anything on a gameboy that I want using pokemon as entry point, see my last 3 part blog series on details for that, I wanted to check out something that somebody found out about not all that long ago with 2 hardware registers, ff76 and ff77. If you were to look in any official documentation those would not be mentioned or used anywhere, and most old unofficial documentation just lists them as reading as "0". But what somebody seemingly noticed is that they are in fact not always 0 but represent the current volume of the 4 instruments that make up the gameboy audio output, and released a demo for that:
When I was writing my own gameboy emulator:
I did also implement those registers how I imagined they would be on real hardware and ran that demo above on it, which works just fine:
Of course at that time, I had no way of actually checking it out on an actual gameboy color as I had no flashcart or anything similar to run my own code on it, but now that I can run quite a lot of code, up to 1122 bytes to be exact using pokemon, I decided to write my own sound test with a builtin visualizer!
If you want to see both part of the installation and it then running, check out this video, just for like the first 2 minutes see the input viewer on the top left actually installing my code and after that I boot up into my little program:
Now the installer is identical to the one I previously described that installed my cart dumper, this time it just installs a different payload, at this point in time a much bigger one too, taking up 1110 bytes out of the 1122 I have available, at this point in time I really pushed as much functionality into it as I possibly could. Of course, I released everything up onto my github as well:
See the readme at the bottom of that if you want to try it out yourself, its usable for both english and german versions of pokemon yellow and you can listen to a total of 50 music tracks, all sorted after the official japanese game soundtrack.
Let me go through what exactly I all put into just over that one kilobyte, because it is rather interesting to see how much 1kb can truly be
The code starts of very simple and just disables all the games drawing functions, so I can draw my own screen into place, that is just done by setting all the variables related to drawing to be in a disabled state:
After that, I just wait for the screen to be at the very bottom of drawing so I can safely disable the screen, ready to clear out all of the game background with just a plain white background:
It is quite important to wait for the screen to actually be at the bottom because it can actually damage the screen otherwise if you just cut it off in the middle of processing the current frame so of course I dont want to risk anything
Right after that is done, I draw in the little top part with program name and controls:
And I draw on this text to be exact:
The cool thing about text in this case is that I of course boot everything from within a game already so I dont have to make up any form of bitmap tiles for letters or anything, pokemon already provides a perfectly good tileset for me to re-use, so I just have this separate file that assigns each character to whatever tile number the game uses for its characters:
and the compiler I'm using, rgbasm, just automatically converts all characters into those bytes which is pretty cool!
Following that bit of text on top are of course, the volume meters themself:
And yes, I only draw these once, taking up the whole line, how I actually make them seemingly "move" once music plays I'll go over a bit later
This next bit of code really is only needed if you were to use that glitch item to boot my program outside of the shop, the black bars I'm using rely on that tile around the shop to make it look like you are inside a house being black, once you go outside though it gets replaced with some overworld tile, so to make sure it does not look messed up I just make sure I replace that tile always to just be black:
So now with the top bit of text in place I just also place in the text above the volume meters just with the names of the instrument:
...those names you can also find at the bottom of course:
So now the basic screen is prepared, so I turn the screen back on with that new information and start doing something with a special graphics feature of the gameboy, the "window":
In this case I just move it out of the way for right now, it will become very interesting a bit later
Next up it is time to enable vblank and lyc (line compare) interrupts:
Vblank is the signal of the screen being done drawing and at the bottom, this is important for the cpu to know as the music routine that we of course want to use for this gets executed every time a frame is done drawing.
Line compare is a signal you can set yourself, you can essentially just tell the graphics processor on which line of the screen you want a signal given to the cpu, I will use that later to actually make the volume bars move by just getting a signal on the line right before the volume bars get drawn, more on that later.
So now, I am pretty much ready for my main loop, the last thing now is to of course, start up the music!
This function started out very small but got quite a bit bigger in the current version as I not only tell the game which song to play, but also draw the title of the track on screen!
We start off with this bit:
Since this function can get called while something is already playing, the first command to the game is to stop whatever is currently being played, simple enough. That is followed by this bit:
which again, waits for the screen to be done drawing and at the bottom, and then I draw in a little play/stop icon in front of the track title, this depends on if the play function was called with a valid music number or with a number I chose (128 in this case) to stop the music from playing when pressing the B button. In the case of it being the B button that is all this function does and it jumps to the end, but of course in this first case it is called with the first track number, so it continues to here:
I had to get a little creative in trying to now select from a list of track titles to draw onto the screen, and I came up with that bit of code. You see, I specifically chose each title to be exactly 8 characters long, this is because it is very easy to multiply a value by 2 in code as all you have to do for that is shift the current number bits left once, which is done with the command "sla c" in this case, which is the current song number, I shift over that number 3 times, so that means it does the number*2*2*2, which is of course, number*8! I then just add that new number onto the list of names I prepared here:
And that gives me the exact name of the song title I want, which I then just draw on screen!
The next bit is pretty much the same but not for the track name, but for the track number I have to give to the game to start playing back what I want:
In this case I just have to multiply the track number by 2 once, as the track number I give the game is only 2 bytes and kept in this list:
So now that I told the game what I want to play and I have drawn the title on screen, I just clear the signals I may have gotten in this time from the graphics processor and enable the ability to actually get signals now as I am now ready to finally start with the main loop:
So finally, after all that preparation we are actually ready to do something, and the very first thing is the "halt" command:
This simply puts the processor to sleep for now until we get a signal from the graphics processor that it finished drawing or it hit one of the lines we specify, now there are lots of lines that I actually do specify in total:
Over the course of just one frame, the gameboy being 60 frames per second of course, I in total look out for 8 separate lines it is drawing to get into updating something, those lines are the one right before the first volume bar, the second right after it so you can read the name of the 2nd instrument, then again right before the 2nd volume bar, then right after the 2nd volume and so on for all 4 instruments, making up those 8 separate lines.
Now what happens right before a volume bar is first of all setting the next line we want a signal for of course:
Then we get the volume level of that particular instrument, more on how we actually get that later:
And now finally for the special "window" feature I've mentioned earlier!
So to avoid trying to draw the current volume bar level every single frame I decided to instead use a "window", which basically lays on top of the background where so far everything is we have drawn, the text and the volume bars. The window always goes from the right of the screen to the left, and you can specify the exact pixel you want that window to be scrolled to between that, so all I do is have a list of window positions:
I have calculated these for each of the 16 possible volumes a instrument can have to overlap the background so it looks like as if that black bar is actually the one moving according to the volume, but in reality I am actually moving a window exactly opposite to that, on full volume, it is completely off-screen on the right, and on no volume, it is moved all the way to the left, giving that cool illusion and saving a lot of time you would have to take otherwise to draw things!
So all this bit of code does:
is take the current volume level, add into onto that list of window positions and writes that now determined window position right before the graphics processor starts drawing the first line of that volume bar, making it just in time!
The bit of code after the volume bar is rather boring, all it does is set the next line it wants to get a signal for and reset the window position to be off-screen, so you can read the name of the next instrument, otherwise it would just be covered by the window if there was no volume:
Also you may see a small wait function here, that is simply because the cpu in this case would otherwise set the window position before the graphics processor is done drawing the last volume bar line, which could result in it being glitchy.
Now, the last bit of code that is interesting here is the one that happens after all 4 bars have been drawn, because this is actually where I update the current button inputs, switch songs if needed and also actually get the volume levels.
It starts out just like the others, setting the next line signal we want and reset the window position, in this case not just for the next instrument name, but for when it starts drawing the next frame so the top screen text is visible:
Followed by this little bit:
Here I just call the game function to update the button inputs and then grab the current value for the ones pressed, if A, B, Left or Right are pressed, I jump into handling those inputs:
It may look long at first but really all it does is if B is pressed send that value of 128 I mentioned earlier to the play song function to stop it, if A is pressed just send the current song number to the play song function again to (re)start playback, if left is pressed subtract 1 from the song number and if it is smaller than 0, just set it to the highest song number to loop it around, and if right is pressed, add 1 to the song number and if it is bigger than the highest song number, set it back to 0 to loop it around, and then just call the play song function with that new song number.
So this just leaves us with the last case, no buttons are pressed and we are done with a frame, it is finally time to call these officially unused registers, starting with ff76:
These are responsible for the first 2 instruments and I found out that just reading it once would often times lead to it reporting no volume, this has a simple reason with how audio works, to create a frequency you can hear, it has to turn on and off the volume output very fast at whatever frequency the tone you can hear has, so of course it can very easily happen for the processor to just read it right at the point where it is turned off right now. I noticed that this happens really, really often, so I in fact for every frame read the first 2 instruments a total of 256 times!
As much as that may sound, it is in fact still not enough for certain songs and you can get slightly flickering volume bars. To be honest I cannot do a whole lot about that, you see the gameboy processor is rather slow, so if I were to try and read it any more then there would not be enough time anymore to even draw the next frame, in fact, sometimes there already is not enough time and on certain frames all volume bars will appear to look full because of that, that luckily only happens when the song currently playing sets all 4 instruments at once anyways so they all get a burst of volume in that case so they would all actually appear as full, so this is not so much an issue as it just is a bit of a small technical detail
Now for ff77 representing the other 2 instruments, I read far less, only 64 times for every frame:
This is because of for one the very limited time I have already and also because really they are not nearly as problematic as the first 2 instruments in the way they work so even if you were to read them more often, they would not look much different.
After it is done reading those 2 registers for so 256 and 64 times respectively, it stores the biggest number it got for both into the volume variable then used for the next frame once it hits the start of the volume bars.
What a bit of code to go through and describe in detail, now you know just how much 1kb of data can actually be and how much thought went into something that may seem so small, had a lot of fun coming up with code that would even fit into that space, I had to go over it and optimize things several times to somehow stay within that small 1kb space, one of the big space eater here being the all the track titles, even though I already cut them very short to 8 characters per title, they still make up 400 bytes of the 1122 available, there are a total of 50 music tracks you can listen to overall, combine that with the text for all the channel names and the text on top of the image and it goes up to 486 bytes of just pure text!
If you actually made it this far then thanks for reading.
Dumping GameBoy Games the Insane Way - Part 3Sep 24, 2018
Well, it is time to continue this little blog series and finally get to the point of them - actually dumping gameboy games! Make sure to read my previous 2 blog entries to catch up and what I did so far.
Now that I have a consistent and easy way to get to a point in pokemon yellow where I can install my own code into any memory address I want, it was finally time to see if my original idea was even going to work, that idea is to dump games using the audio headphone output of a gameboy color.
Yep, I am not joking about that, sending the game through the beeps and boops the gameboy is able to produce was my idea all along, there is a reason I had "insane" in my title
Getting bytes into audio form can be done in many ways, but my particular idea was to make use of the volume register of the gameboy to send distinct pulses that I can record, detect and then evaluate on a PC to retrieve the ROM back into its original form. While there are up to 8 volumes (3 bits) per channel, I decided to only make use of 4 volumes (2 bits), because we are still talking about recording a gameboy here so I figured it'd be difficult enough to distinctly detect those 4 volumes.
So I started making some very basic code by literally writing the instruction bytes in a hex editor, at this point in time I did not even bother writing and compiling machine code and I just went straight to the bytes I wanted.
The code started out like this:
All that does is restart the audio to clear any previous game state, then enable the square audio channel with a very high frequency so I get fast spikes and lastly wait for me to press start to begin sending whatever was inserted into the cartridge slot.
The sending started by first sending those 4 volumes on both left and right channel as a form of calibration:
This of course will make it much easier for my receiver on PC to convert the pulses back into bytes. The very first test send of those volumes I recorded looks like this:
Now after that comes the actual ROM data, 1 byte is made up of 8 bits, meaning in 1 pulse I can send half a byte, 2 bits per channel make up 4 bits, meaning 2 pulses make up 8 bits, one byte. The good thing about that actually is that half a byte is also known as a nibble, and the gameboy specifically has a function to swap both nibbles of a byte around, making this code very fast and efficient to extract 4 bits and send them:
Once I have those 4 bits, I have to still convert them to a volume value I can set the volume register to, so for that I made something called a lookup table, basically that is a list of values that get picked depending on whatever bits are set on the input:
This lookup table takes those 4 bits as input and gives me a byte where I already set the volume for both left and right channels depending on what 4 bits were set, I then write that value into the volume register, enable the square channel I used at the time, let the cpu just do nothing for a while and then disabled the square channel again. That process then just was repeated for the other half of the byte, the next byte then got read, repeat.
Do that process for the whole cart and you got yourself a full game dump!
With this very first test version I never made any full dump though because it would have been very, very slow, in fact I calculated something around 249 bytes a second only, so it would've taken over an hour to actually send over the 1mb of pokemon yellow, so I immediately reduced the delay in my pulses down to something a little better, giving me a speed of about 0.9kb/s.
The very first dump of my pokemon yellow cart with that speed took about 19 minutes and its hash indeed matched the hash of a known valid ROM so this was already pretty great! Now I did also focus on making this usable with very generic base recording settings of 44100hz 16 bit, which if we have a quick look of that volume test pulse from the last image how that looks now:
So this leads me more into my current design from where I went away from the original square channel I used over to the wave channel, which purely on accident I discovered to be incredibly stable at outputting a smooth volume, which led me to speed this up one last time to about 2.2kb/s, meaning now 1mb actually only takes 7 minutes and 50 seconds, which I think is a pretty excellent time, with that even the biggest cartridge size, 8mb, only takes a little over an hour, just as much as my very first prototype send would've taken for 1mb!
Lets also have a quick look of the volume pulses now:
At this point I also re-did my whole set of emulator inputs for installation because I felt the original method to get to code execution was rather ugly and took too long, the previous video I showed took 22 minutes and 32 seconds to finally show a textbox with text I wrote on screen, so now knowing a bit more about pokemon yellow I cut out quite a few things that were not needed, this led me to make this new setup:
Which, spoiler, only takes 15 minutes and 25 seconds to show a textbox with text I wrote! So just by better planning I was able to save over 7 minutes which I felt like was pretty good now, thats where my setup is now.
With this new setup and everything I felt like it was finally time to also add some form of graphics to my dumper, so far it has been showing nothing on screen and only waited for you to press start and then started sending whatever it read back from where the cart is supposed to be, no matter if it was there or not. Because there was still all of the games font in VRAM, I finally wrote some routines that clear the screen and then display some of my own text using the leftover font, which right now looks like this:
If everything is good it will then ask you to press A to start sending the cart, at which point you can record it on a PC of course. If one of the bytes does not match, it will print an error message instead and ask you to press A to return to the insert cartridge screen again so you can make sure the cart is inserted/clean to be read. Once you do accept it and its sending it also displays that as a message of course:
Now for the receiver side, I really dont feel like going into too much detail on that because it was a pain to write it up into a good working state, right now it takes a simple .wav file as input, also it specifically has to be normalized in audacity before exporting because of my hardcoded values for what volume level counts as "silence" and what count as "peak".
It basically takes a set amount of samples where it thinks should be silence, a peak and then silence again, compares the samples against some hardcoded values that I just decided should be big/small enough to count as silence or a peak, and from there it writes that read byte into a new .gbc file until the recorded .wav file is done being read. Also in the end it prints out crc32, md5 and sha1 hashes of the file it just wrote, all that output looks like this:
Also of course if you want to read that long receiver code for yourself:
Oh I should mention, this entire project, sender, receiver and installer is on my github, so you can go explore everything of course:
So there you have it, my current state of this crazy project, as of right now to use it you need a gameboy player with some form of modded gamecube with a sd gecko to use gameboy interface to install the dumper onto the german version of pokemon yellow, in the future I would like to of course also support at least the english version as well, it all just so happens to be for the german one because well, I am from germany so thats the cart I have laying around here. Also it most likely can be done for many other pokemon games as well, not just generation 1 games but also generation 2 games have exploits to install code into as well.
The sender I would also like to extend further to include save dumping, that should also be very easily doable, it is not in there right now because my current installer can only install up to 636 bytes of code into the pokemon save and well, my current dumper code just so happens to also be 636 bytes, yep, I already had to fight to even get everything crammed in. That said I did look a bit more at the save today and saw that I can safely extend that space up to 1122 bytes, I just have to update my installer for that and then I should be able to get to that feature as well.
Anyways that is it for now, hope this was interesting to at least some, thanks for reading.
Dumping GameBoy Games the Insane Way - Part 2Sep 22, 2018
I suppose I should continue what I started now right on the point where I made some pretty big changes, make sure to read my last blog entry for the introduction of what I've started.
Last time, I left off on making a set of inputs to beat pokemon yellow in a very short time with some heavy glitches, but I did not go into any detail as to what I was actually doing, so let me do that now.
To quickly go back to the basics, the idea is to glitch out the game to gain access to a very glitched shop, this allows you to "buy" a glitch item which can be used to essentially break free of the item menu list size and just scroll down further into other parts of the games memory. Last time from there I then just modified the exit location when you exit the shop to not lead back into the town but instead lead into the hall of fame, which is the end of the game. That is all simple enough really but I wanted to much more.
At this point I was pretty close to run my own code as this broken item menu list gave a huge chunk of memory to modify, and one particular bit was the one I ended up attacking - the current map script.
This particular bit of memory points to some code that gets executed every other frame as part of whatever map you are currently in, to control all the events that happen in it. Now my idea here was to modify this to instead point to some bit of code that I can control freely - all the extra memory I can control from the item menu to be exact. To get some code into that memory though I needed to do a bit more setup, and the easiest thing I saw was to modify the rival name at the beginning of the game. You see, that name is stored right below the normal item list, so also part of what we can modify!
The thing with the item menu is that all you really do is modify 2 bytes at a time, the first byte is the item id, and the second the item quantity. All you can do with that is swap 2 byte stacks around, or subtract parts of the 2nd byte, if the item id allows to be tossed. This is often not the case and as we will learn later lead to me having to come up with some unplanned workarounds.
Because the easily controllable area - the rival name - is pretty short, not a whole lot of room for code, I decided to keep it very simple. My idea was to just read out whatever buttons are currently held on the "controller" (my inputs in an emulator), then write that byte into parts of the item menu I can control and open the start menu again so I can move that newly created bit out of the way to create another bit.
The rival name is also pretty limited on its own, you cannot create any byte below 0x7F, and also from 0x80 and up you cannot choose every single byte either, so I had to get creative. The bit of code I came up with is this:
That really is how vague my notes are, I know it does not really say anything so let me explain as to what I planned here.
The first byte, F0, is easily created by just taking a bit of empty memory, 00 00, and in the item menu "throwing" parts of it away, so this allows you to basically make anything from 00 01 up to 00 FF, so making 00 F0 was easy enough.
F5 is a valid character byte you can use in the rival name, it is "♀", so thats byte number 2.
For EA I then just took the next highest possible byte I can use for the rival name, thats "♂" which has a value of EF.
Because I now had the "item" F5 EF I can just throw some from it away in the item menu to get F5 EA.
Next up is storage, and for this I as you can see was not quite sure at first what I wanted to do, but decided to go with a huge chunk of memory very low in the item menu that was empty, in this case address D3E6. The nice thing here is how addresses are stored, you see, D3 is not a character you can use for the rival name but addresses are stored with the lower byte first and the higher second, so its stored as E6 D3. E6 just so happens to be a valid character for the rival name, "?", so I just used 2 of them to get E6 E6 and threw some of the second byte away to get E6 D3.
C3 is again not a valid character but I used a simple " " in the rival name, which is 7F, this instruction basically translates into a "do nothing" processor instruction so it made for good filler. My next character then was again, you guessed it, "?" for E6, so I could throw that away to get 7F C3.
Now 94 is also a valid character, its just a "U", so I used that, that was followed by an "end" byte, 50, so my plan was to then throw some away to get 94 02. Spoiler, this ended up not being possible because 94 just so happens to be one of those items I mentioned you cannot throw away.
Anyways, with this name I set out to start glitching:
From this point, the game just goes on up to the point where I can finally glitch the item menu and I throw everything away and then realize, oh no, I cannot throw away the "U". This led to some questionable solution of literally selling RAM to the shop keeper. You see one of the "items" I found deep in the item menu could be sold for 9350, and the money count is within the item menu reach to modify, so I just put in a bit of free memory, 00 00 into the spot and threw some of it away to get 00 52, and then sold part of game RAM to add up to get 94 02, the instruction I originally planned to have.
So now I have some code ready, but how do I execute it? Well, in memory there was some bytes, I think it was F6 FF or something like that, well I ended up throwing enough away to get F6 D3, or in other words address D3F6, parts of memory I could modify, and moved my now built bit of code into that memory location, and then lastly replacing the current map script with that new pointer to D3F6.
So great, now I can unpause and whatever buttons were held get written to D3E6, why is that so helpful? Well I can now just always move a empty bit of memory, 00 00 into place, throw away whatever I need to get anything from 00 01 to 00 FF and when I unpause it writes the held buttons over the first 00, essentially allowing me to make any 2 bytes I want! You see, there are 8 total inputs, making up 8 bits, or in other words, 1 byte.
With this power I now was able to very slowly get more code into the game, and I made up this:
Also after that I end up with some code opening the start menu again, but not in quite the same way as the first code did it, because I noticed that way actually crashes the game after a certain amount of pauses because of a memory corruption, so I have to add 6 to the stack before jumping to the start menu function, resulting in the last few bytes after EA D1 D3 being E8 06 C3 FD D3, basically add 6 to stack and jump to D3FD which is just the jump to the start menu from the first bit of code again that was still left over.
So now, after all that work that takes minutes and minutes to execute because the item menu moves so slow, I can now write into whatever bit of memory I want and write whatever values I want into it, all that now was left to do is generate a jump to this newly saved code at D3D2 by generating a D2 D3 "item" with the first bit of code and moving it over the map script again, before moving it over though also generate another item that THEN can jump to the big code I just wrote, in this case I generated 20 D6, which jumps to D620, meaning I have D620 up to D6AF to write whatever I want into that then gets executed.
With all that work, what did I end up writing into there? Well, I really did not have any full plan at this point, so all I did was open up a text box with some text I wrote and playing some background music track, just to have a visual confirmation that hey, all this crazy setup actually worked. All that ended up becoming this youtube video:
...and thats all for part 2! Yes, I know, I STILL have to get to actually dumping gameboy games, but that will be done in the next part, because I am still not fully done with everything for that for a proper release, thanks for reading.
Dumping GameBoy Games the Insane Way - Part 1Sep 18, 2018
For quite a while now I wanted to look into a method of dumping gameboy carts without needing any special device for it, just a plain old gameboy. I pretty much had that idea since I wrote my gameboy advance link cable dumper:
That project takes the gamecube link cable and a gameboy advance/gameboy advance sp and allows you to dump carts directly to a gamecube or wii sd card.
The issue with that is, it does not work with gameboy carts. Why? Because it uses a different processor and operating voltage, when you plug in a gameboy cart into a gameboy advance it actually presses down on a physical switch inside the slot on the left, that switches the CPU core from the ARM based gba one to the gameboy CPU and also changes the operating voltage from 3.3 volts to 5 volts. In that gameboy mode, you cannot use the link cable anymore.
So, what can you do now? Well my idea was to use a game exploit to execute my own code in a gameboy and then go from there. I dont really know how many exploits there are exactly but I did know that pokemon was very, very glitchy and I just so happen to have a german copy of pokemon yellow.
Now, any of those glitches that allow code execution require quite a bit of setup and are pretty time consuming, but this particular video gave me hope that maybe, just maybe this idea may work.
Back when that video came out though I really did not have any idea on how to perform any of those crazy glitches myself so I just left it at that for now, an idea.
This year, gameboy interface, a homebrew software for the gamecube gameboy player, got an update that allows playback of any set of inputs you give it via a text file, and there was even a TAS of pokemon yellow that was made in an emulator and then verified on console using this method:
and the person who made it, TiKevin83, even included his script to convert inputs from emulator over to gameboy interface! This led me to grab the latest version of BizHawk, the emulator that script was made for, and started making some test inputs in its integrated tool TAStudio, which basically lets you set every input pressed per frame.
So I just made a quick test up to fighting your rival in the beginning and then exported it and tested it on my actual cart:
If you want to see it you have to put this below into your browser and remove the space from it, gbatemp seems to try and auto convert this to a broken stream page every time:
And that most certainly worked just fine!
So now that I saw that yes, this may just be possible now, I had to get to a point where I can execute some code from, so I looked at the current pokemon speedruns as those use lots of these glitches in a very short time so it should be ideal for my plan. That led me to this particular route:
Now I started implementing this route slowly in an emulator, and some points I did not really understand or they were a bit different in the current speedruns and those differences were not mentioned in that route, so I improvised a LOT, walking in strange ways to manipulate the memory just right in one particular house for example, moving the sprites in the required places.
Also in the end it said something about dropping a specific amount of a glitch item and for some reason I had to drop less, I assume that has something to do with me doing this on a german cart but honestly I dont quite know considering how messy this was. All that said though, it DID end up working and I was able to scroll past the normal item menu into game memory, and to demonstrate I can manipulate memory I should not be able to manipulate I warped to the end of the game! Of course this was all still done in an emulator so it was time to again let that script convert it to inputs for gameboy interface and give it a shot on console:
Again if you want to see that put it into your browser and remove the space:
Again everything worked out perfectly fine!
Now that I had a consistent method of getting to a point in the game where I can manipulate a small portion of RAM I had to think of methods to make use of all that space of course to actually get some code into the game.
And at this point I will for now cut this part, the next part is basically a work in progress still on what I did from this point on, consider this more of an introduction I guess
NES Patching - Batman RotJ Sound TestSep 10, 2018
Well, after my last blog entry about the sound test in gimmick, I remembered that in the sound test of batman return of the joker I did notice one bit of music clearly missing and while I wanted to look as to why I just didnt for some reason, until now. The sound test is pretty easily accessible by just pressing a+start on the title screen, leading you to this screen:
Also I had a look on tcrf again and saw this little bit:
"Track 6 in the NSF file is not present in the sound test. The fact it loops suggests it might have been intended for a stage."
So I was right in thinking that something was missing here.
I tried looking in a hex editor if I could find a string like "SOUND TEST" in it, but did not find anything so yet again, it was time to fire up the emulator and check out whats going on!
This was about as simple as it gets, right in the beginning of RAM it was very obvious what byte is used to represent the currently selected bit of music:
All that does is increase that position and if it finds out that position in address 0x11 is larger or equal to 0xD (N), it will loop back down to 0 (A). Also that bit of code above includes the same type of checks for the sound effects position in address 0x12 not going higher than Z, as well as when you want to go below A with the BGM, it will go back up to M, and with the sound effects go back up to Z. Next up I had to figure out how high those values really should go, because I had a feeling theres more sound effects than up to the letter "Z" and maybe even more BGM too, which was just a quick check of what reads from the music position at 0x11 when I press "A" to start playing that bit of music.
Ah yes, there is a list of actual music values involved of course at address 9C85, so it takes the position from 0x11, then loads whatever the corresponding music value is from that list, and stores that value into 0x41. Oh and for the sound effects that bit of code was again right above this, where it gets the position from 0x12, loads the sound effect value from a list at address 9C92 and stores that value into 0x42. Now it was time to take a look at those lists to see exactly whats going on.
Here is that list for the BGM:
If you know your hexadecimal system you may notice that 0x06 and 0x0E are missing from that list, I had a quick check if those are indeed music tracks by just manually writing 0x06 and 0x0E into address 0x41 as the game would do to start playing music and indeed, 2 music tracks you dont hear in the sound test started to play. So this would be the first set of patches, I replaced the comparison of larger or equal to 0xD (N) with larger or equal to 0xF (P) as well as when going below A it would go back up to O instead of M. Now the nice thing about how music works here is that it actually is not just a random list of values but its just going up by 1 every time, so the only difference from the selected position and the actual music value is that the music value is 1 higher than the position (position 0=music value 1 etc), so I replaced that load from list with it just increasing the position from 0x11 by 1, moving it over to register A where it normally loads the music value and just as a technicality a "nop" instruction, that one does literally nothing, I just had to add that because the original instruction to load the value from list took 3 bytes, but the increase of position and moving it to register A only takes 2 bytes, so that "nop" just acts as a "filler" byte.
Great, now all the music is accessable! Of course I could not stop there, I had to take a look at the sound effect list as well, its right behind the music list, I marked it here:
I noticed that right at the end there is that jump from 0x38 to 0x3B, and if you know your hexadecimal numbers that means at least 0x39 and 0x3A are missing, so I wrote those into address 0x42 and indeed, those are valid sound effects. I then just to see went up higher, so 0x3C, 0x3D... and all the way up to 0x48 there were valid sound effects, meaning that in total even though this is a sound test, a whole 15 sound effects were not selectable from it! Again since the sound effects were all in order and not all over the place, that list again was not needed, I replaced the sound effect load from a list to simply add 0x20 to the sound effect position from 0x12 (so position 0x00=sound effect value 0x20 etc) .
So to make all those sound effects selectable it was just a simple job of patching the effect comparisons up to not stop at Z but go much higher, now you may notice an issue with that, after Z theres not exactly any more letters to go up so while yes, I now could listen to all sound effects, the selection doesnt look all that great...
So address 0x11 and 0x12 dont just get read for increasing/decreasing and choosing a piece of BGM/sound effect to play, but also from a 3rd bit of code:
With all that, we now replaced the A-Z system with a system that can go from 01-41, now of course its time to give this a shot on real hardware to make sure what I just wrote up all works!
And it is now working as it should, very good!
As always, I of course pushed that patch to my github.
If you are for some reason interested in recordings of the PAL version from my hardware using this patched file, here you go:
I know, these patches are all pretty useless but still, I feel like just sharing the process this goes through so I have a reference for later and to maybe help others understand how things work in a bit more detail. Thanks for reading.
NES Patching - Gimmick! Sound TestSep 8, 2018
Recently I wondered if the nes game gimmick! has a sound test to listen to its excellent music so I did a quick search and yes, it actually does as is described on tcrf:
Just hold down select and start on the title, simple enough.
"It contains every track in the game, with the exception of Evidence of My Life (later named in a separate OST) and an alternate version of Cadbury with slightly different instruments."
In this blog entry, I will write down what I had to do to make a patch like this.
While those 2 tracks arent exactly long or special, it still seems weird to me as to why they would not be accessible in the games own sound test, so it was time to fire up a hex editor and see where that text is located, maybe it was just deactivated or something. Well, after a quick search we find this:
Now that we know how it copies text over, we have to figure out how it decides which song to play once we press A, this is simple enough, I looked at the RAM when pressing A and saw that address F5 changes to 80, so scrolling down a bit further down that code that set up the text I found this little bit:
Maybe you already noticed it, but that very first picture contains less song names than are visible in the hex editor, thats because that list can be scrolled down further when you press down, that part is handled by the code right below the one for starting and stopping the music:
With all that information, we now know enough to think about editing this sound test with 2 extra songs, we will have to add 2 bytes with the missing song number to that list at 9DD9, then we need to add 2 bytes to the scroll list at 9DEB and of course add the new text and addresses to the big text list at 9DFD.
There is of course one problem, all those lists are right next to each other already, with no room in between, so we cant really add anything into that is it is right now... That means, we have to move it around a little to make some more space for all that extra data! Thankfully, right at the end of the memory area used for this sound test, there is actually a little bit of leftover space:
...which actually, is perfect, remember how earlier I said there was some wasted code with copying the text in 2 sections by having 2 end bytes (FF FF) in there? Well, why dont we just change those 2 end bytes (FF FF) into a spacer instead (FF), edit that first copy address to now be where the song number list was previously, 9DD9, and then jump over the code that does that second copy:
"The round cursor color depends on bits 1 and 2 (a total of four different colors) of some music data, which don't frequently change and remain at zero most of the time with the exception of two music tracks: Strange Memories of Death and Paradigm. This color change only occurs in the Japanese version."
With that description, it was very simple to find the exact code bit this was talking about by just looking at the RAM and the code:
After all those edits I of course made a patch for it and put it up on my github, as always:
Now there was just one last thing left for me, and that is give this a shot on real hardware and see if it all loads up as I would expect and work as intended:
Success! Everything shows up exactly as it should, plays back audio just fine and the cursor now is actually active for once changing color frequently.
If you are for some reason interested in recordings of the PAL version from my hardware using this patched file, here you go:
Thats it for now, as always, interesting how something so simple can get so complicated. Thanks for reading.
NES Patching - Mega Man 1 and 2 PAL MusicSep 3, 2018
Today I felt like writing a bit about some of the other random stuff I do, if you did not know, a lot of older console titles were very lazily released on PAL consoles which led to them running slower in both audio and video. I dont really mind the slower gameplay itself, but the music part of it is really annoying to me if it was not adjusted, as is the case with megaman 1 and 2. Because I do own a PAL NES living in europe I wanted to see if I could somehow patch my way around at least the music.
This is a full breakdown of what went into making these PAL megaman music patches on my github:
To demonstrate how it is unpatched, have a listen to flash mans stage in megaman 2 as it is supposed to sound from ntsc:
Got that beat still going in your head? Well, let me quickly destroy that with the PAL version of it:
I think now you understand the issue a bit better.
So, I had a quick google around to see if anyone had done some work on it before and while I did not find any existing patches, I did find a incredibly useful disassembly of both megaman 1 and the audio driver of megaman 2:
Having a read through this, I saw that each song has a speed value attached to it:
This value gets used to multiply each notes length to calculate how long it takes between notes, so I thought hey, why dont I change that 6 to a 5, maybe thats enough of a speedup to get closer to ntsc speed, and well, have a listen for yourself:
WOW, for such a simple patch that did exactly what I wanted it to! So I went ahead and applied this method to every song in the game and thought at the time it all sounded correct (spoiler: not every song was correct), now before calling that patch done I did also edit the intro and credits length because the audio did not sync up anymore.
This process basically was just me watching the game RAM to see where a value counted down until another event happened, setting a breakpoint on that RAM value to then reduce the initial value that was written into it by the game so it would take less time to count down, thus speeding up whatever was happening. I did that to many small parts and in case you dont really understand what I just wrote, let me give one direct example of this with a bit of slowed down video, watch how the screen transitions:
You see on the bottom right how those screen transitions happen when both the value I marked and the value next to it hit 0? Those values then get reloaded with a new number, $49 in the spot I marked and $01 in the spot next to it by this bit of code:
That starts counting down again and those numbers are exactly what I edited to take less time to count down, let me give you a very drastic example of very small numbers to see it very clear:
In this one I just forced the value next to the one I marked (framecnt high) to load with 0 to skip a lot of counting, of course in my patch I strategically chose values that let the intro and credits be perfectly synced with the music without anything being skipped on screen, this was just a quick video demonstration of the technique I used
Right, so then I went ahead and released that as v1 of my pal music patch, thinking it was all fine...
and then recently I finally looked at megaman 1 because that uses pretty much the same exact music driver. So in that game as well, there are the exact speed values too that you can edit:
So I again subtracted 1 from that, it sounded great, I went on to do the same for the rest but I noticed that when you select a boss, things start... messing up. Let me demonstrate what you would expect to hear:
...and what I actually heard when I modified the speed value of the boss selected music:
You heard how that ending note just got stuck? Well I did again mark the position in RAM in that video that messed up here, that one took me a bit to figure out, basically what happened with me reducing the speed value here is that the note length suddenly underflowed, meaning it became a very, very large number (FF as you can see) instead of going to 0 as it is supposed to for the next note to load, you see the speed value gets multiplied with the current note length to determine how long it should wait to load the next note. In this particular case the note was a length of 2 and my patched speed value was 5, meaning the calculated wait time was 10. The issue is, every time the music driver processes a note, it subtracts 4 from that wait time, and you see, 10 is not exactly divisible by 4 meaning that it subtracted too much, resulting in the note suddenly becoming a very large number instead of 0:
Of course I went back to megaman 2 to see if it happens there too and yes, it happens on the boss select as well as quick mans stage, but it does not make a note get stuck like that but just silences the channel instead because it actually underflows on a pause command rather than a play note command, thats how I did not even realize something went wrong there.
After thinking a lot about how to tackle this issue in the shortest way possible, I came up with a machine code patch to replace the current logic of subtracting 4 to instead subtract 2, see if its 0, if it is load the next note and subtract 2 from that, and if its not 0, subtract 2 from the current note again. That way I never miss a length subtraction and everything should stay in sync properly.
Now to make an edit like this, I would need to add extra code to the existing one by injecting it into some free space, so I looked at both megaman 1 and 2 to determine the best spot, and at the highest possible spot for code to be added there luckily was some free space in both games:
It may not be a whole lot, but it was worth a try, so now for the entire code bit I wanted to edit after thinking about it:
So I decided to replace the check for it being 0, subtracting and checking if its 0 again with this bit of code:
What a first patch! Now that is just patch number one of course, this is the "subfunc" function:
Yep, I used every last byte that was available to me, that was indeed a close one!
Now with this patch in place, have a listen to the boss selected music now:
Now thats more like it! With this patch in both megaman 1 and 2, it was pretty much done, so I now finally finished up editing all the songs in megaman 1 and adjust its ending to be in sync with the music using the exact same technique as I did before.
The only thing that I now noticed was off was icemans stage in megaman 1, the volume of some notes was cutting off way too early, have a listen to this:
Those notes should just stay held but they fall off pretty much immediately, this one took me a bit to figure out but basically this particular music command:
Sets the length of the volume curve ($86) and the strength ($20) of which it should fall down. Now only in this particular song it seems to fall off in such a way that it gets reset right when the length hits 0, you see when it hits that 0 length in the song itself it actually immediately reloads the next note, meaning the volume never gets decreased because it gets reset at the same time.
But now that we have a different play speed, this does not occur because the volume only gets reset if that length is 0, and the note reloads at a different point now, meaning it does not reload the volume and it never recovers the loss in volume. Since this is the only song to have this issue from what I've heard, I decided to just modify the length of the volume curve in this song alone in such a way that it plays back as close to the intended audio as possible. And while I did not get it 100% perfect for every note, it is good enough to me, have a listen to that part again that before cut off after modifying the volume curve length:
Now it does not cut off early anymore! So while probably not the prettiest solution it did work out well enough for something that was never intended to be played at PAL speed I think.
So there we have it, now you know how much work patching just a few bytes can be, if you want to give these patches for PAL megaman 1 and 2 a try, again there are up on my github of course in case you did not see the link in the beginning of this blog entry
Even if you have no interest in PAL things I hope this was still an interesting read to you about the process of patching a couple of bytes in something as simple as NES titles, thanks for reading.
GameCube Save Exploits - Ghost Recon 2Aug 30, 2018
Yep, I felt like writing about yet another save exploit, this time about the latest one I released:
Why you wonder? Well, it took me quite a while to find a viable entry and I can actually write about every step properly and not just in vague terms as I sometimes had to before.
In case you dont know, these gamecube save exploits load a homebrew executable of your choice from memory card so you can have homebrew without any modchips or something like that, just a game which has an exploit, a way to install a hacked save like a hacked wii and that is it.
First off, why did I decide to look at ghost recon 2? About little over two years ago, I looked at the original splinter cell and found a good exploit in that, and for some reason I decided not to look at the other splinter cell titles afterwards and then a couple days ago I finally did - leading to me finding a very simple entry in pandora tomorrow, that one is so simple in fact that its not even worth writing about
I then looked at the other splinter cell titles and while I did get them to crash, I did not immediately see any useful point from where I can do something, so I moved on to other ubisoft titles, and in ghost recon 1 I was able to make a proof of concept exploit, but it was really unstable because that exploit involves corrupting the allocated game memory to then hopefully overwrite parts of code, but that game memory moves around constantly, so I was not sure if I can make it consistent, leading me to move on.
Now I then fired up ghost recon 2, made a save, looked at it in a hex editor and it seemed very simple, in fact I made a second save where I slightly changed the name of the save file because I did not see some immediate save hash I have to fix up when I want to edit the save which I usually have to do first, reverse engineer the game save hash function, but in this case I looked at it and there was nothing, no verification hash!
so I then in dolphins debug memory tab I went right to that point and indeed, there is our invalid value:
So, setting a memory breakpoint at 813da424, it led me to this particular bit of code:
So well, it led me to the function that gets called right when it loads the save the first time, and the copied name is actually a fixed length:
So now knowing how the formula works I first set a breakpoint in dolphin right at 802B5E9C to see what r3 actually gets loaded with before our controlled value gets added onto it, leading to this:
This indeed shows that r4 is a value we chose and r3 is some address in gamecube ram, great! Now, to see where I wanted to redirect that to I dumped the entire gamecube ram in dolphin and in a hex editor looked for a good spot that we have control over, I tried many different things but in the end chose to go with the save name left over on stack by some previous function:
Now I changed that name to an address to point to a string in memory we choose to be big enough to change the code position that function returns to, the value I found best for that is right at 80DE2768 because it has a large chunk of the save stored there that we can just freely edit:
So I just change the save "name" to that memory position:
And fill it with a string large enough to overflow up to the point where we can edit where the code jumps to:
You may notice that that string is pretty packed with much more than just some code address to jump to, thats because I put the actual loader code much further down into the save which ends up being in a very high memory position that I dont trust being consistent, so I instead went for a custom bit of machine code that basically searches for that loader code and then jumps to it.
To make all this happen, a few conditions have to be met. First, I need some bit of code that actually jumps to what is essentially code on stack, since we overflowed the stack thats where our code is right now, for that I chose 802BFE24:
Now here is that bit of relevant code:
I know that location because I set a breakpoint right at that location in dolphin and just looked at the stack memory. Now what does that ICBlockInvalidate do? Well, you see because it just copied our code over as a string, it is in the processor data cache. Code however is in the processor instruction cache, so that function basically writes the data cache to main ram and prepares the instruction cache to read from it, thats why it was so important to choose this exact function to jump to. Now this bit only makes it possible to execute 0x28 bytes of code which is not quite enough for a search, so we will extend that in the code we actually place there. Also that stw instruction also overwrites part of our code with garbage so we will have to work around that too.
Lets quickly talk about the massive limitations this code has to have to work at all.
You see, strcpy sees a "string" to end when it finds a byte with a value of 0x00, so in order for our overflow string to copy code over to the stack, that code has to have absolutely no 0x00 byte or it would get cut off short. This turns out to be not a very easy task, I mean just look at the bytes of the code above I just showed you:
Even if you cant read machine code like this you can probably see how weird it looks just compared to everything else you've seen so far, this is because I had to look at each instruction carefully and make it so it contains no 0x00 bytes when compiled which wasnt easy but hey, I did it in the end:
From there, we jump over that corrupted instruction I talked about earlier:
And then this entire rest of the code:
Basically searches for a "magic word instruction" that I chose, "mfmsr r4", which in bytes is 7C8000A6 which you can see in that code above basically checking every 4 bytes if it finds that exact set of bytes, and when it does, it finally jumps to it. If you wonder where it jumps to, it is of course the common loader which I wrote a previous blog entry about. That common loader then finally loads a homebrew executable from memory card and executes it, which I do of course have a short clip of:
After all that I of course had to go through the steps to find the memory locations again for the PAL version and also for the different languages you can boot the game in, which went pretty smooth now that I knew exactly what was going on.
Man this entry got longer than I initially expected, just like all the other entries I wrote so far, never expect them to get this long, so if you made it this far, thank you for reading.
Nintendont Bugs - Mario Party 4 FreezingAug 26, 2018
Having spent a long time over the years working on the wii homebrew nintendont, a program that allows you to play gamecube games in wii mode, I've encountered some weird bugs over time. This blog entry is about the latest one I fixed up, mario party 4 freezing.
To give a quick rundown of what exactly was freezing, after booting the game and going into the extra room (last option in the main menu) , selected the characters you want to play it is then supposed to come up with a game list, instead it just seemingly froze up on this screen:
A year ago, I already tried to look at this once and tried simple things such as compiling nintendont with different versions of the compiler which did sometimes seem to actually make a difference in the issue but it seemed to do so for no simple reason. Also, I noticed when you reset the game at any point after it was booted, the issue would also be completely gone at which point there now at least was a simple workaround so I left the issue there, waiting to be looked at in detail at a later point.
Now, I finally returned to this issue with the simple idea of maybe there being a memory difference somewhere, something in RAM that may overwrite parts of the game perhaps, so I went into the internal game loader and changed the memory clearing from being most of the RAM around it:
to now instead clear all of the RAM around it:
which sadly did not make a difference, but I still left it in my latest commit simply because it probably is a good idea anyways to make sure the memory is as clear as possible in case some game does have issues with it.
Next up, I did also test to just move the memory position of nintendont up a bit which also made no difference at all, which I was expecting as it should be cleared by the game loader anyways. At this point I wanted to make sure my game loader was not at fault so I replaced it with the game loader of the game itself that gets executed when you reset the game, since we know that reset method worked I wanted to know if it was related to that loader, but nope, even when using the games own loader, the issue was still there on first boot. Though one positive thing that came from this test was that I noticed a lot of memory sets in my own game loader were not needed:
So in my latest commit I now replaced them with ones that are much smaller and work just fine as well:
At this point, I tried to fully clear out all of the main memory and just tested that by moving nintendonts memory position, but I still was not fully convinced it actually cleared out everything so I ended up also writing the entire main memory into a file right after the game got loaded, before it jumped to the game start point. After looking at a hex editor it indeed was all cleared out and only the game and game loader were present in memory, exactly as it should be.
Now I did remember that there was that oddity about switching compiler versions which I now again tried out and while I did not expect any difference, switching back to a compiler version from 2014 suddenly gave me this after I selected characters in the extra room:
While I still had no idea how, I proceeded to now change more things about the compiler, namely the code optimization flag:
You see how this reads -O3? Well that is basically telling the compiler hey, make this code run as fast as possible. So just to see if anything happened, I changed this out for -O2, which is just slightly slower, let it all compile with the old 2014 compiler, tried out mario party 4 again and:
So since the regular RAM seems to be unrelated for sure now, I had this absolutely crazy idea, what if some processor register got set and the game never initializes that register anywhere and then uses it in the extra room with whatever value it had when the game first booted up? To test this, I replaced my normal jump to the game loader:
With some code that first clears out most processor registers and then jumps to the game loader:
And then updated to the latest compiler again, set it to -O3 as it was originally, compiled everything just fine, fired up mario party again and guess what, IT WORKS FINE NOW. So after all my tests with memory, moving code around, verifying memory, it seems to really be related to processor registers. Now of course, I wanted to know which ones, so in the code above I started removing some of the init lines until I had issues again, the line that again broke everything was this one:
which has its definition right below:
This did seem pretty crazy to me because these registers are basically what the processor uses for nearly every single processor instruction for fast variables, so somehow through everything thats still abead, game loading, going through the main menu and so on, one register variable stays uninitialized.
So now it was just a matter of me removing line per line of that to find out exactly what caused issues. After a while I finally found the magic line:
I did also print out what the value of r19 was on crash, and it was -20, so just to verify I'm not going crazy I went back to the old 2014 compiler and -O2 which worked perfectly, and just added a line that loaded -20 into r19 and indeed, the game now crashed.
Of course I did the same test to see what caused the arrows with -O3 and the 2014 compiler by again removing line per line of that init register function and those arrows are related to this:
Again printing out r18 revealed that it was also -20 when the arrows happened, so I went back to the latest compiler version with -O3 set and added a line to load r18 with -20 and sure enough, now the arrows were back.
So now in my latest commit, I have all the init code in place, resetting everything, as well as the code to clear out all the memory when the game loads to hopefully not run into a crazy issue like this again.
In conclusion, this bug was related to a simple processor register never being initialized by the game and then when used very late into its execution, if that register contains a unexpected value, the game suddenly gets very weird due to this being undefined behavior.
This bug was really difficult to catch because it involved something you would never expect to be at fault but now with all this extra protection in place I hope an issue like this does not happen again.
Thats it for now, thanks for reading about this crazy bug hunt.
GameCube Save Exploits - F-Zero GXAug 21, 2018
While I previously explained the steps on how to go from already having code execution up to running homebrew files, I did not yet write down how to actually find an exploit and then use it to get to code execution, so I decided to do that today with f-zero gx. While I did find the exploit myself and made use of it, I was not actually the first to discover it, you see back at the speedrunning event SGDQ2018, there was actually a demonstration for it, see this video at 11:30
Also at the end of the demonstration my name was actually listed in the credits which was really cool to see, and while I was curious as to why and what they all did, instead of immediately asking them directly about it, I instead wanted to find it out how the exploit works for myself just to see if now that I've seen it demonstrated I could reproduce it as well.
Now having the knowledge that it has something to do with replays, I did a quick search to see if there was any public info on replay save files already and sure enough, there was this really detailed breakdown of it:
and also an actual graphical interface for editing the replay save files directly:
While reading through that documentation file I did notice that it also contains data for a custom machine, basically in f-zero gx you can create your own machine to race in, so now I decided to head into dolphin, make my own custom machine and do a quick race on the first track of the game to have a good base replay file to mess around with probably all of the important data that hopefully would lead me to break the game in some way.
So now I just had to export that save from dolphin and open it in that graphical interface:
So, now I had a lot of numbers in front of me which I could just mess around with, make them really big and see if any of them would cause some weird things to happen to the game. This all in all took me quite a few hours because really it was a LOT to go through, but then finally, after changing the "Body ID" way at the bottom from its initial value of 09 00 00 00 to a large number like this:
r1 and r3 got me very excited, you see, strcpy has the job of taking a string from one memory position (r4) and copy it over to another memory position (r3), in this case r3 clearly is based on the stack pointer (r1), the stack pointer is used for various purposes like this but you see, it also is used to store critical values such as where to jump next to in code, and strcpy has no limit as to how much it can copy, so when you gain control of strcpy and give it a very, very long string you created you can break the intended maximum length that the stack pointer has prepared for and just overwrite things like the positions it jumps to next in code!
In normal operation, everything works as intended because of course the allocated size on the stack was chosen to be big enough to support all the strings that can be copied by the game developer, but once you can break that limit on a console like a gamecube or wii, you can pretty much do what you want!
and now for the most important thing of this breakpoint is this value:
The LR (load register) essentially is the address that is responsible for calling this strcpy, so now all we really have to do is look at its code and see what happened, I just gave it the name of copy_machine_id_string, dont ask me how I come up with these names, I just do
Now I wanted to know what function called copy_machine_id_string, well this in dolphin was again just a simple case of setting a breakpoint and it led us to this bit of code I called get_machine_id_name:
That array is rather short compared to the possible values you can set "Body ID" to, which as I marked at 80309514 that the id given can cause an overflow, also here you can see the body id is getting moved from r3 to r29, you'll see that this is the indeed the body id shortly.
After that array got copied over, the code then goes on to move the body id from r29 to r3 in 80309540, the line below it actually moves the language used which is in r5 over to r0 (I used the US release here so this is actually 0), then it multiplies the body id (r3) by 0x18, it then loads up that freshly copied array address from r1, the stack, into a separate register, r28, adds the language (r0) on top of the multiplied body id (r3) and stores its value in r27. Lastly, it basically just adds r27 and r28 together and loads whatever is in that combined address into r3 to finally hand over to copy_machine_id_string, the function responsible for calling the strcpy that caused the invalid read. What a long bit of code to explain!
Well, if you remember that dolphin breakpoint from earlier, we can actually look at r27, r28 and r29 now to confirm what I just explained is correct:
So this just confirms that indeed, r27 is just r29 (the body id 0xFF) multiplied by 0x18:
With all this, we now know that we can essentially control a load from anywhere between 801B67B0 (r28) and 801B7F98 (which is just 0x801B67B0+0x17E8, the biggest body id we can set), well if we quickly scroll through that area of memory in dolphin, we suddenly find a familiar bit of memory:
if you dont recognize it anymore, scroll back up to the part where we first made it crash with changing the body id in the save, seem familiar now? Theres also a reason I specifically marked out 801B7670 here as you'll find out later.
What luck, it seems like on the stack is still some leftover bit from the actual replay file being loaded and we can actually reach it by just setting the body id to some value of choice! Now at this point, I finally went ahead and wrote my own little tool to get ready to inject some custom code into a replay file which is located right here:
This bit of code includes r28 for all the regions, here you can see it for the US version we've been looking at so far, 801B67B0:
followed by a machine id that will give us a very nice position we can modify in the end I'll explain later:
and also of course that multiplication of the machine id by 0x18 we talked about:
Now, how exactly did I come to choose that particular machine id? Well, let us have a quick look at the save file in the graphical interface again, but this time scrolled up slightly:
So now if we take r28 again, 801B67B0, and take the machine id I chose in my tool, 0x9C, and multiply that by 0x18 and adding it onto 801B67B0, we get 801B7650, which you may notice is just barely within the space of pixel data which ends at 801B7670, its as if I planned for that to happen
Right, now what exactly do we even write into that position? Well, remember that our goal is to overflow a string to jump to our own code, so we first write in an address for our string into the bit of pixel data we just set up for:
You may notice that those are 6 writes total, remember how I earlier talked about language also being a thing added to that read address? Well, this just makes sure we cover all possible languages that can get added to the address. We wrote in the address emblem_arr_start into this position, which as you saw earlier is the start of the pixel data, so now next up we make up a string that is 0x4C bytes long which as I explained earlier makes it possible for us to modify where the code jumps to next:
and now of course we copy in the code position we want to jump to right behind that string:
which I chose to be the pixel data+0x50 bytes.
All I now have to do is copy the common loader I wrote about in the previous blog entry into the pixel data+0x50:
and the rest of that tool code now just writes our prepared pixel data and our body id of choice over into a replay save file, makes sure its checksum is valid so the game can read it, gives it a nice name and banner and that is it!
Oh and what I mean by nice name and banner, its just how it shows up in whatever memory card editor you then use to import the save, I like to always edit things like that to stand out a bit
Right, of course if you want to see it in action here you go:
That is pretty much everything covered, make sure to also read my previous blog entry if you didnt already to see how that common loader works to get a full picture of the exploit, thanks for reading.
GameCube Save Exploits - The Common LoaderAug 20, 2018
In my last entry I went into detail on how animal crossing gets to the point of total control from where we can do anything we want, now I will go over the common loader used in a very similar form in all my gamecube save exploits. How this common loader gets into memory of course differs from game to game, in the future I may write up the details for other games besides animal crossing as well.
The goal of this common loader is to load an executable off the memory card the hacked save is on and then, of course, execute it. This loader also has to be relatively tiny since it has to fit into that hacked save file so we are limited on size. Thanks to the twilight hack for wii being open source, my very first gamecube exploit was a port of that one, giving me a great start on how to make this happen.
Because of the size limitation, I cant just compile a standard executable with all the important functions builtin as is normally done on gamecube and wii because that would result in a massive bloated file, instead I have to make use of all the functions of the game that are left in memory when I gain total control. So this means I had to come up with a good structure of standard gamecube software development kit functions to call and then I of course had to locate them inside the games memory.
Luckily, I know my way around a lot of the standard sdk functions thanks to nintendont yet again, because nintendont actually patches a lot of functions to update them from gamecube mode to wii mode.
The title of choice I always look at for reference of the sdk function code is actually the megaman anniversary collection because it actually came with a leftover executable on disc that includes all the function names! Normally on any other game, all those names get wiped out on release so having a title that has all of that open to read is essential to easily finding any of those functions in all other games.
So now let me go over the very basic layout I decided on for the common loader:
-turn off audio first, else you would just hear one "beep" the entire time it loads the file because nothing updates the audio anymore, we have full code control after all
-finish up whatever was being drawn on screen, so the next application can start up a new picture
-mount the memory card
-open our executable of choice
-temporarily load the executable in unprocessed form into a separate memory location
-close our executable of choice
-unmount the memory card
-put our executable parser into memory and jump to it
While that basic layout is followed by each exploit, some of my later ones have some additional functions for either added stability or because they just work differently. I wont go into every single small difference simply because I dont remember all of them but I will go over the ones that I do remember.
This first call I will describe is one of those extra functions.
Often times, games dont just run on one thread, but end up using multiple threads to do different things. So to make sure we dont suddenly crash the system by doing something another thread does not like, we have to turn off anything else still running with the function OSDisableScheduler:
Let us have a quick look at how that machine code looks in megaman anniversary collection:
This is about the function to turn off audio. Turning off audio is very simple and is done in an identical way in all the exploits I've released so far, we simply call the function __OSStopAudioSystem:
Let me show you a small piece of its machine code from megaman anniversary collection:
This method is what I use to this day to find any of the functions I need in games, though sometimes I have to go through closely related functions because they contain nice patterns to search for. Now to quickly go back onto the previous function for a second, OSDisableScheduler, its actually very closely related to __OSStopAudioSystem because both actually are called up by a function called OSResetSystem, here is a picture of that again from megaman anniversary collection:
Next up, we have to finish up the screen data we may have interrupted so the next executable can draw new things. All we do for this is say to the GPU that hey, we are done drawing, always using a very similar method. In the earliest exploits that did not stop threads, a function by the name GXDrawDone was called, it basically sets the needed GPU registers and then lets the other threads execute until the GPU gives the signal everything is finished up:
Of course, this function cannot be used in games that have threads disabled at this point, so for those we instead use GXSetDrawDone which has the slight downside of not waiting until the GPU processed that signal, but I dont think so far that caused any issues:
A special case in this is animal crossing which actually does not have GXSetDrawDone available, so for that one I basically re-implemented its code in C which is not really the nicest way but its at least functional, so I'm OK with that:
The amount of sdk functions available differs from game to game, functions that the game did not use got stripped out the final release so sometimes I have to do workarounds like that.
Before mounting the memory card, we have to do one slight modification to the game id set in memory. Every game has its own game id and that is always located right at the beginning of RAM (0x80000000). Normally for memory card access, this game id determines which files you can open, create, modify and so on. For executables on memory card, we always load a file called "boot.dol" with the fixed game id "DOLX00" so thats what we write into the beginning of RAM:
In the special case of f-zero gx, this game id location was actually deliberately changed from the beginning of RAM to somewhere else so for that we use CARDSetDiskID to set it back to where it should be:
Depending on the game we then have to re-mount the memory card, for this we call CARDMountAsync and then call CARDGetResultCode until that result is no longer -1 which tells us the mount operation finished:
In case of splinter cell I actually found that internally the game already has a function that does exactly that so all I do in that game is call that internal game function:
Finally with the memory card ready, its now time to open the "boot.dol" executable on it, usually done with CARDOpen:
Though in the special case of pokemon colosseum/pokemon xd, there is no CARDOpen, instead that title uses CARDFastOpen which does not take a name as input but instead only an entry number from the list of files on the memory card, meaning for that title we first have to go through all the entries on the memory card using CARDGetStatus and when we found the file we are looking for finally opening it:
A little more dirty but at least it works out.
Now with the executable ready to be read, we simply read it either using CARDRead in titles that still have threads enabled or CARDReadAsync and then calling CARDGetResultCode like we did earlier to mount the memory card in titles that have threads disabled, then putting the read data block into auxiliary ram, and repeating that until the whole file is read into auxiliary ram:
What exactly is auxiliary ram now exactly? On gamecube, you had your main 24MB of fast ram, thats still present on wii too. Additionally, you have also 16MB of slower ram available called auxiliary ram, used for various purposes for games. We first load the executable into this separate ram so we dont accidentally overwrite the code we still use from main ram, we will deal with the parsing of the executable later on.
At this point we are pretty much done, now all we have to do is close the file we just read and unmount the memory card again using CARDClose and CARDUnmount:
This is the last bit of special code right here, specific to f-zero gx that I developed just today, I noticed very early on that for some reason, the f-zero gx exploit was unable to load all executables I threw at it, some would just get stuck, this turned out to be a pretty difficult and long hunt for a tiny bug in the compressor those executables use, dollz. To save space when putting an executable on memory card, you can use compressors like that, and dollz happens to be one of the older ones. My personal compressor, dolxz, worked just fine with f-zero gx and other titles so I had to analyze what dollz did. The annoying thing with dollz is that there is no source code, so I had to go through its machine code and line by line try it out and see what made it crash, this was a pretty slow process but eventually I found the little bit of code that crashed it:
After really going into detail, I found that instruction marked in yellow to be wrong, basically it compares if the length it just subtracted is greater or equal to 0, wheres instead it should just say if it is greater than 0. Why? Well, it gets called to save the entirety of the fast ram, going from 0x80000000 to 0x817FFFFF, but because it does the compare wrong it ends up with one extra cache save of 0x81800000.
This causes no harm in other titles because the address space thats technically valid normally goes all the way up to 0x8FFFFFFF so even if not that much ram is installed, the processor sees is at valid space, even if it cant do anything with it of course. Now for whatever reason, f-zero gx does NOT follow that standard address space, instead it has one address space going from 0x80000000 to 0x80FFFFFF and a second address space from 0x81000000 to 0x817FFFFF, meaning to the processor that save of 0x81800000 is actually invalid, thus leading it to crash!
Once I figured out that, I also found the game actually has a second option for address spaces in case more valid ram is available, in that case it makes the first address space go from 0x80000000 to 0x81FFFFFF and the second one from 0x82000000 to 0x82FFFFFF. So as a workaround for this crazy crash, I now just call the function that sets up this bigger address space which while technically not being usable ram, at least makes the processor think its valid space, thus fixing this crash:
That sure was a long detour to write down, now we get to finally boot into the special parser for the executable we just got into the slower ram, we copy that parser into memory that is not used by the executables we parse (0x80001800) and jump to it:
We are done with the game functions now and are now in a memory location that is unused by any of the executables we load, so we can now safely parse the executable from slow ram into its final position in the main ram, no need to worry about anything being overwritten!
Parsing standard .dol executables (which is what we loaded with the name boot.dol) is very simple, first we load up the header from slow ram into a variable we can actually use:
And now we just have to go through it which is a very simple process, the header consists of a very simple list of the location of the code it wants to copy, the location it wants to copy that code to and the length of that code, so all we do is go through the entries of that list and copy the code from the slow ram into the desired location in the fast ram:
Followed by pretty much the exact same list but instead of code this is about any other data the executable uses, this could technically be handled differently but we just use the exact same bit of code we used before but this time for the list of data:
To finish off we now just have to jump to the location marked in the .dol header as the starting point for the just loaded executable, finally bringing you to your homebrew application of choice:
Now if you've also read my previous blog entry about animal crossing you hopefully have a full picture of how much is going on in such a simple looking gamecube save exploit, hopefully this was interesting to you, thanks for reading.