can't tell how shit will go on the Switch, but as far as desktop is concerned, melonDS will remain a serious option, y'know
if we get DraStic as an option for lower-end hardware, or just for people who want speed in general, that's good
but melonDS is the only one implementing melonism with class
more seriously, tho:
https://drastic-ds.com/viewtopic.php?f=5&t=4391&p=18970&hilit=hacks#p18962
DraStic relies on several game-specific hacks. this approach is good for lower-end hardware where the extra accuracy needed would be too costly. but, as history has shown multiple times, any emulation solution that isn't accurate will keep running into issues. which is why melonDS's approach is to avoid game-specific hacks at all costs. the extra accuracy needs higher-end hardware, but also means that you're less likely to run into issues when playing obscure games.
as an example, several of the games listed in that post, run in melonDS without requiring any game-specific hack.
some others don't. what I want to know is how far we can get, without requiring absurd hardware of course. (I sure as hell hope we don't have to emulate the ARM9 caches)
for all I know, CPU timing is still vastly wrong in melonDS.
one common problem is that of badly-programmed games that have a race condition or whatever. on an environment like PC, those would quickly be caught and fixed due to the nondeterministic timing of those platforms. but, on the DS, games run on the bare metal, and all consoles have the same timing characteristics. so, basically, such bugs tend to go unnoticed, because it works out of sheer luck. but when you run it on an emulator with different timings, it explodes.
we had the same issue with GX timings and Spellbound for example. the game sends GX display lists via DMA, but here's the catch:
* it transfers a certain amount of words, then sets up an IRQ handler to continue the transfer once the GX FIFO is less-than-half-full, and so on until everything is transferred
* after setting up the first transfer, it
immediately sends a SWAP_BUFFERS command, without waiting for the transfer to be actually finished
so how that works is that every time a DMA is finished, the GX FIFO must be less-than-half-full already, or nearly. otherwise, the time interval will cause the game to send a SWAP_BUFFERS command in the middle of the display list, which you guess doesn't go well.
while working out GX timings, we found out a lot of fun things. did you know that the GX is capable of parallel execution, to some extent? furthermore the timings are different if polygons pass or fail culling/clipping/etc. and so on. melonDS attempts to emulate all that, but it's still not perfect.
you can see from code comments that DeSmuME ran into similar issues in the past, and they took the easy route of making their GX commands artificially fast. measurements show that NO$GBA adopted a similar approach.
and yet, the GX is lame compared to a full-fledged CPU (hi, ARM9).