I managed to get all the speed back... and a little more besides!
And I found out why it's crashing on the DS. I'm out of memory
This is an emulator designed on the PC and they have huge look-up tables... and it doesn't take long for the 4MB to get swallowed up.
On the DSi there's no problem as it's got 4x the memory (16MB vs 4MB).
I'll optimize the memory and get it back to running...
Have you delved yet into streaming from DLDI? There are a lot of NDS projects doing that. From what i've found, it's best to let the SD filesystem read chunks of dldi sectors. FatFS allows to read chunks of sectors (I imagine libfat does the same), and halves the DLDI calls by at least half and that means a small speedup. (also because you'd be skipping the POSIX calls).
This is a VS2012 project implementing petitFS for embedded/baremetal ARM7 (NTR) but also runs in windows for further debugging.
Coto88 / petitfs-tgds — Bitbucket
and this is a VS2012 project implementing FatFS for TGDS ARM9 (NTR/TWL) which features the complete set of POSIX calls you'd use in unix/linux/nds homebrew (TGDS projects and libnds)
Coto88 / toolchaingenericds-helper — Bitbucket
this way you can debug your own NTR/TWL filesystem tasks before doing it manually through DS homebrew, as it lacks the debugging stuff and takes a lot of iterations (maybe months or years, like I used to do).... once you got something working in there, it can be moved easily into NDS homebrew!
Last edited by Coto,