Well thats the thing, am not using anything related to the standard toolchain but doing everything from scratch on my own assembler/compiler, with the arm7 being the main processor and the arm9 being a slave that responds to FIFO irqs.
So for example I included your dldi binary near the base of main ram and as expected it gets hooked on real hardware and the read/writesectors function pointers get plugged in like they would normally, so I would just read that address put the parameters on the correct registers and do a BX call which would switch the processor to ARM state and read or write a sector on disk.
Also my stacks are really shallow the data stack is max 32 bytes or 8 words deep and the return stack is a little bit larger with 256 bytes, but they each grow into each other so I guess whats really happening could very well be some really bad stack corruption.
So I guess the design is way too different from the standard and something gets clobbered along the way, but at the moment I havent implemented better debugging tools to see whats happening exactly. I had similar problems with the PC version of the same compiler(the one am using to build the ds version) with opengl corrupting my stack but I never thought it would be an issue on baremetal.
Well, I can save you a few years (5 so far) of your lifetime, and maybe... you may want to use TGDS homebrew? It's an alternative to devkitARM. Licensed code is GPL v2(or +). So you don't have to waste time on little nuisances and get to work straight into what you want to achieve.
You are asking these things to the correct guy I tell you. The ARM ABI is a standard, thus, by following it, it ensures ARM cores to behave correctly. Make sure you read :
The Status register - ARMwiki (heyrick.eu)
PSR and conditional execution (heyrick.eu)
Also you have better debugger in TGDS homebrew... yours truly.. TGDS-gdbstub!
Coto88 / toolchaingenericds-gdbstub-example — Bitbucket
If you add that code into any TGDS homebrew, you can remote debug memory through a GDB stub! I use:
affinic gdb debugger, which is a frontend UI for win32, hiding GNU GDB Debugger. You can inspect realtime EWRAM, ITCM, DTCM contents. through WIFI, between the DS and a PC.
Also I added that as an exception handler. So if you access, say, invalid DTCM area, a memory protection unit, data abort will trigger... calling a GDB debug session! So you can inspect what is in the stack at the time the exception took place.
Also i'm not sure you're using 32 bytes as stack (processor stack)? 32 bytes is 8 nested ARM opcodes! That will crash instantly. ARM cores have full descending stacks (which means the stack pointer BASE address, set in the SP register, r13, will always substract n registers (4 bytes each) from the current stack and save them in ascending order, starting from base new address, until the original r13 stack address, 4 bytes each).
This means:
0x02400000 <-- top EWRAM, mirrored.
irq_stack_mode: 0x02400000 - 512 bytes; //irq mode now has 512 bytes available
svc_stack_mode: irq_stack_mode - 512 bytes; //svc / supervisor now has 512 bytes available
usrsys_stack_mode: svc_stack_mode - 512 bytes; //User / System mode now has 512 bytes available
Thus, when the ARM core enters any mode, and enters a function, registers need to be banked/saved, thus by pushing them onto the stack will SUBSTRACT as many registers required, from the base address, into the r13 stack pointer.
When return from such function, you need to restore them from the r13 stack pointer, by retrieving them then ADDING internally the base address into what originally it was when the function was about to be called through a branch opcode.
Also it uses Clang for C/C++ code so it builds much better code than GCC. You can reverse engineer TGDS binaries and it's like reading a book.