Thanks
@Coto for your message. It's really instructive.
New changes made. Please check emulator and post your thoughts
Another trick I remember from the 90's (the last good decade in systems engineering R&D by far), on slower systems like the Nintendo DS, if you look at the hardware design, Nintendo thought very well how to maximize profits while retaining high performance on very specific hardware pieces.
NintendoDS Programming notes:
Cons:
1) Yes, 66mhz ARM9 is slow to do realtime decoding
2) EWRAM sharing the ARM9 and ARM7 experimental design
Pros:
Data TCM and Instruction TCM can be exploited to use constructed LUTs. Such as calculating color coefficients, lights, or samples of any kind. The barrel shifter can then (through an ALU opcode) work with these tables... from TCM memory!
This allows to keep up with very decent speeds when, decoding a frame of some sort (video/audio).
This step usually is an optimization out of a very slow, but working software implementation.