So, I've been thinking about the direct sound latency problem, and I think there are basically 2 solutions.
The first option is basically to adjust the latency of the other audio channels to match the direct sound channel. This could be done via the sound capture feature of the ds. This would at least synchronize the sound, but it doesn't solve the latency.
Edit: After another thought, this is actually less straight forward than I thought, since I cannot directly play captured audio together with the direct audio channels without also passing that to the capture. This would thus require manual mixing which is not very nice.
The second option is basically to patch games to ensure their sound buffers are in main memory (where the arm7 can access them). This would remove the need to fetch the samples via the arm9 and makes it much easier to create reliable low latency audio output. The downside is that this option would most likely never work in general for all games.
A solution could be to have a mix of both, where supported games use option 2 and others option 1.
I also realized that the latest version contains a small bug where it doesn't fill in gaps when audio blocks are dropped, which causes more desynchronization when it happens often (shouldn't though).
If the direct sound channel uses a rate of 15kHz, the latency is about 4 frames atm. I think I might be able to reduce that a bit more, but it would still be a noticable difference.