Coto
you don't see this often on the Internet. So yeah, thanks for that.

edit: as for the massive bloated file. Don't even mention about it, in ToolchainGenericDS just the C code takes about 320K having fs drivers, printf, and overall hardware layer required for DS homebrew to work. I add C++ code and it increases to about 750K (the sum of both ARM7ARM9 binaries), and discarding sections randomly will just cause undefined behaviour (such as allocating a char buffer that points to BSS, and the BSS section being discarded, or moved to other parts of DS memory, and then you write to it causing buffer overflows/exceptions).

Also the magic of relocatable code allows to exploits/self modifier code to work which we've seen across several code that works using a pool of opcodes, and recalculating the offsets required so they run even when the memory mapped is not available for it.
Aug 21, 2018