Nintendont Bugs - Mario Party 4 Freezing

Having spent a long time over the years working on the wii homebrew nintendont, a program that allows you to play gamecube games in wii mode, I've encountered some weird bugs over time. This blog entry is about the latest one I fixed up, mario party 4 freezing.
To give a quick rundown of what exactly was freezing, after booting the game and going into the extra room (last option in the main menu) , selected the characters you want to play it is then supposed to come up with a game list, instead it just seemingly froze up on this screen:
TV2018082610094400.jpg
the game did not fully crash though, as the background music continues to play like normal and controller reset combinations still work, so it also updated the controller just fine, but it did not render any more frames and got stuck on this one frame.
A year ago, I already tried to look at this once and tried simple things such as compiling nintendont with different versions of the compiler which did sometimes seem to actually make a difference in the issue but it seemed to do so for no simple reason. Also, I noticed when you reset the game at any point after it was booted, the issue would also be completely gone at which point there now at least was a simple workaround so I left the issue there, waiting to be looked at in detail at a later point.

Now, I finally returned to this issue with the simple idea of maybe there being a memory difference somewhere, something in RAM that may overwrite parts of the game perhaps, so I went into the internal game loader and changed the memory clearing from being most of the RAM around it:
https://github.com/FIX94/Nintendont...8ae6cefc277ccfb8abe/multidol/global.c#L28-L31
to now instead clear all of the RAM around it:
https://github.com/FIX94/Nintendont/blob/master/multidol/global.c#L29-L33
which sadly did not make a difference, but I still left it in my latest commit simply because it probably is a good idea anyways to make sure the memory is as clear as possible in case some game does have issues with it.
Next up, I did also test to just move the memory position of nintendont up a bit which also made no difference at all, which I was expecting as it should be cleared by the game loader anyways. At this point I wanted to make sure my game loader was not at fault so I replaced it with the game loader of the game itself that gets executed when you reset the game, since we know that reset method worked I wanted to know if it was related to that loader, but nope, even when using the games own loader, the issue was still there on first boot. Though one positive thing that came from this test was that I noticed a lot of memory sets in my own game loader were not needed:
https://github.com/FIX94/Nintendont...8ae6cefc277ccfb8abe/multidol/global.c#L33-L58
So in my latest commit I now replaced them with ones that are much smaller and work just fine as well:
https://github.com/FIX94/Nintendont/blob/master/multidol/global.c#L34-L40

At this point, I tried to fully clear out all of the main memory and just tested that by moving nintendonts memory position, but I still was not fully convinced it actually cleared out everything so I ended up also writing the entire main memory into a file right after the game got loaded, before it jumped to the game start point. After looking at a hex editor it indeed was all cleared out and only the game and game loader were present in memory, exactly as it should be.
Now I did remember that there was that oddity about switching compiler versions which I now again tried out and while I did not expect any difference, switching back to a compiler version from 2014 suddenly gave me this after I selected characters in the extra room:
TV2018082610415400.jpg
So there was no freeze this time, but normally there is supposed to be text instead of those arrows where you select a game to play from, this is getting weird... So I again dumped the main memory, compared it against my earlier dump that just straight up froze and... no differences. This means that somehow, just the way that the game selector is compiled has an effect deep in mario party 4 code and it has nothing to do with the main memory.
While I still had no idea how, I proceeded to now change more things about the compiler, namely the code optimization flag:
https://github.com/FIX94/Nintendont/blob/master/loader/Makefile#L31
You see how this reads -O3? Well that is basically telling the compiler hey, make this code run as fast as possible. So just to see if anything happened, I changed this out for -O2, which is just slightly slower, let it all compile with the old 2014 compiler, tried out mario party 4 again and:
TV2018082610320500.jpg
The behavior changed yet again! Now it displayed as it should. I did also just to be 1000% sure dump the main memory yet again and there still was no visible difference at all.
So since the regular RAM seems to be unrelated for sure now, I had this absolutely crazy idea, what if some processor register got set and the game never initializes that register anywhere and then uses it in the extra room with whatever value it had when the game first booted up? To test this, I replaced my normal jump to the game loader:
https://github.com/FIX94/Nintendont...c277ccfb8abe/loader/source/main.c#L1594-L1598
With some code that first clears out most processor registers and then jumps to the game loader:
https://github.com/FIX94/Nintendont/blob/master/loader/source/jmp813.S#L10-L18
And then updated to the latest compiler again, set it to -O3 as it was originally, compiled everything just fine, fired up mario party again and guess what, IT WORKS FINE NOW. So after all my tests with memory, moving code around, verifying memory, it seems to really be related to processor registers. Now of course, I wanted to know which ones, so in the code above I started removing some of the init lines until I had issues again, the line that again broke everything was this one:
https://github.com/FIX94/Nintendont/blob/master/loader/source/jmp813.S#L11
which has its definition right below:
https://github.com/FIX94/Nintendont/blob/master/loader/source/jmp813.S#L20-L50
This did seem pretty crazy to me because these registers are basically what the processor uses for nearly every single processor instruction for fast variables, so somehow through everything thats still abead, game loading, going through the main menu and so on, one register variable stays uninitialized.
So now it was just a matter of me removing line per line of that to find out exactly what caused issues. After a while I finally found the magic line:
https://github.com/FIX94/Nintendont/blob/master/loader/source/jmp813.S#L37
I did also print out what the value of r19 was on crash, and it was -20, so just to verify I'm not going crazy I went back to the old 2014 compiler and -O2 which worked perfectly, and just added a line that loaded -20 into r19 and indeed, the game now crashed.
Of course I did the same test to see what caused the arrows with -O3 and the 2014 compiler by again removing line per line of that init register function and those arrows are related to this:
https://github.com/FIX94/Nintendont/blob/master/loader/source/jmp813.S#L36
Again printing out r18 revealed that it was also -20 when the arrows happened, so I went back to the latest compiler version with -O3 set and added a line to load r18 with -20 and sure enough, now the arrows were back.
So now in my latest commit, I have all the init code in place, resetting everything, as well as the code to clear out all the memory when the game loads to hopefully not run into a crazy issue like this again.
In conclusion, this bug was related to a simple processor register never being initialized by the game and then when used very late into its execution, if that register contains a unexpected value, the game suddenly gets very weird due to this being undefined behavior.
This bug was really difficult to catch because it involved something you would never expect to be at fault but now with all this extra protection in place I hope an issue like this does not happen again.

Thats it for now, thanks for reading about this crazy bug hunt.
  • Like
Reactions: 16 people

Comments

Uhhh now I am porting gbaemu4ds to ToolchainGenericDS. Certainly the compiler flags cause a lot of issues.

https://stackoverflow.com/questions/11546075/is-optimisation-level-o3-dangerous-in-g

the people there say it is much safer now (for normal unix environments), but in my experience it leads to a LOT of bugs:

- sometimes optimizes sequential access in single read/write operation registers (such as IPC receive vector):

Normal use:
ldr r0,=IPC_VECTOR
ldr r1,=[r0,#0] #msg0
ldr r2,=[r0,#0] #msg1
ldr r3,=[r0,#0] #msg2

Compiler/-O3 flags:
push {r0}
ldr r0,=IPC_VECTOR
ldmia r0, {r1,r2,r3}
pop {r0}

or some linkers are optimized to discard bss section, or the linker fails to see any use for tempBuffer allocated, and then you use:

code.cpp
char tempBuffer[0x200] = {0}


Normal use:
tempBuffer:
.skip 0x200
...rest of code from same/next sections

Compiler/-O3 flags:
tempBuffer:
.byte 0xff (if linker is told to fill such section with 0xff)
...rest of code from same/next sections

--

or the usual read/writes from memory that does not support 4 bytes 32bit reads, but plain 16 bit read/writes:

uint16 tempBuffer[0x8];
uint16 destBuffer[0x8];
tempBuffer[0] = 0xc070;
tempBuffer[7] = 0xc878;

destbuffer[0] = tempBuffer[0];
destbuffer[7] = tempBuffer[7];


...

Normal use:
ldr r0,=tempBuffer
ldr r1,=destBuffer

ldrh r3,=0xc070
strh r3,[r0,#(2*0)]
ldrh r3,=0xc878
strh r3,[r0,#(2*7)]

ldrh r2,[r0,#(2*0)]
strh r2,[r1,#(2*0)]

ldr r2,[r0,#(2*7)]
str r2,[r1,#(2*7)]

...
Compiler/-O3 flags:
ldr r0,=tempBuffer
ldr r1,=destBuffer

ldr r3,=0xc070
str r3,[r0,#(4*0)]
ldr r3,=0xc878
str r3,[r0,#(4*7)]

ldr r2,[r0,#(4*0)]
mov r2, r2,LSR#16
str r2,[r1,#0]

ldr r2,[r0,#(4*7)]
mov r2, r2,LSR#16
str r2,[r1,#(4*7)]
 
As a programmer and computer technician I know all the pains of tracking down an issue that doesn't seem to have an easy identifiable cause. I remember my mid term test when I was learning how to fix computers where the Windows installer would fail to install this one particular file. None of the usual troubleshooting methods worked. As a last ditch attempt I swapped the RAM module. Bingo, Windows installed just fine.
 
  • Like
Reactions: 1 person
This felt like reading a Dolphin progress report blog post, I loved it.
Surely you would be an amazing addition to their team if you happen to work on it someday :P
 
  • Like
Reactions: 1 person
forgot to say: the virtualized environment expects the context of all the registers regardless what the host code does (such as entering exception, changing stacks, etc). What gives best compatibility for such environment if you, ie, use the irq handler virtualized (jump to game irq handler, but with memory protection bits set to guest mode) is to use :
-mapcs-frame

Now I remember a post I did back in 2015 in my profile
"-mapcs-frame can save your program flow if you don't use any kind of frame save/restore context! :P" which verily fixes some games breaking up when jumping between host/guest mode. If a register expected by the game is destroyed and not restored, well you know what happens.

edit:
https://www.quora.com/What-is-the-purpose-of-mapcs-frame-ARM-gcc-option

pretty much this

edit2:

processor ABIs tell the behaviour when entering/leaving a inter-function, possibly you manually debugged different ABIs , and that would explain why setting -20 (in an ABI that did not resort to that trick), to make the game crash. And in the other ABI context was restored manually (-20 added by hand), and that made the game work.
 
Could this fix cause issues with other games? Or could it actually prevent some unknown bug we could encounter on some random game?
 
  • Like
Reactions: 1 person

Blog entry information

Author
FIX94
Views
584
Comments
14
Last update

More entries in Personal Blogs

More entries from FIX94

General chit-chat
Help Users
    K3Nv2 @ K3Nv2: Leo just follows Luke around with flowers