Back.
Thanks to the help of yellows8, Normmatt, smea, and plutooo I finally managed to get it working. Basically the first issue was getting the cache invalidation to work properly every time a new block of code was pushed to the recompilation caches, and once that was done I got stuck on another issue which I initially thought was some weird cache issue, but ended up being a stack overflow (thanks to yellows8 for finding that one, I probably would never have guessed it would be a stack overflow...). Long story short, I relocated the stack with some more space and it's alive and breathing.
There is one slight downfall, and it's the cache invalidation. Currently due to the nature of how gpsp does it's stuff, the blocks are generated pretty much JIT, so after each block another block is generated. The problem is that every time I write to the recompilation cache, I have to invalidate the entire instruction cache (which is a lot). And with games that are rather large, this is a bit of an issue invalidating the cache takes a bit of time, and since large games often branch and go around quite a bit, this can equate to a metric crapton of lag.
The other issue now is rendering, rendering is becoming a bottleneck in some situations due to the use of the CPU rendering in order to flip the frame buffer the right way up. So I'll probably be implementing GPU rendering soon as well in order to minimize the lag generated here.
So basically at this point, some games run extremely well, some run absolutely horrible (worse than the interpreter), and a lot of optimizations are needed both on my end as well as maybe on ninjhax's end to allow the invalidation of specific blocks of the cache vs the entire thing. The good thing though is that we have the dynrec going fairly well at this point, which is good.
EDIT: Updated .3dsx is in the main post for those who want to try it. Don't get your hopes up too high for speed though, it's a big WIP at the moment even with the dynrec in place. Also, exiting with the X button isn't in place yet.
EDIT 2: Holy crap, just fixed a thing and now Fire Red is running much faster than the interpreter (still not 100%, probably closer to 60% or 70%, it's hard to tell without sound). Apparently I was invalidating the cache even when I didn't write anything new to the translation cache. Huge performance increases, although video is still lagging it down.