Tech speak begins: 
@DiscostewSM - To solve the priority problem:
On the main (or sub) screen:
1. Draw backdrop with depth 0 (depth test disabled)
- this effectively 'clears' the screen.
2. Draw sprites with depth appropriate depth values (3, 6, 9, 12), but they are drawn based on the sprite priority register $2102/3 (depth testing is disabled)
- this should effectively draw the sprite layer correctly but without the additional sprite layer texture
3. Draw backgrounds 0 to 3 with the appropriate depth values depending on the BG modes so that they will positioned correctly (depth testing is enabled)
This eliminates the need to have a separate sprite layer texture. Another reason I didn't have the sprite layer texture because I couldn't figure out how to extract the pixels that belonged to a certain depth. Furthermore, the benefit of this method I used was that it eliminated the need for deferred drawing of tiles, or extracting and redrawing the separate sprite layers later. So some games actually benefited from the modest performance improvement.
But there's probably some flaw somewhere in the priorities that isn't totally right yet...
Just for completeness, the problem of this method was the color math. But managed to find some tricks using the alpha channel to make it all work together. The idea was that the sub-screen textures always have 1.0 as the alpha - actually it doesn't matter because I do not use the sub screen's alpha for alpha blending. But it's the pixels in the main screen textures can either have 0.5 (for half color math), 1.0 (full color math), or 0.0039 (the smallest non-zero alpha for disabling color math). By setting the GPU_SetAlphaBlending with the correct formula (using the main screen's alpha as the destination alpha), and by doing a single pass of drawing the alpha-blended sub-screen to the main screen, we can reproduce most of the SNES color math logic rather efficiently. I'm surprised at how well it actually works on real 3DS hardware.
--------------------- MERGED ---------------------------
@A Fireman - You mentioned BlargSnes runs 100% perfectly. You sure know how to give me a good challenge.
Sprites are indeed rendered differently in Snes9x. In fact, Snes9x renders sprites line-by-line (8x1 pixels), but I think BlargSnes (iirc) renders them in big 8x8 pixel tiles and that's probably why it's smoother. Let's see if I can do anything there... I've got some ideas. 
@DiscostewSM - To solve the priority problem:
On the main (or sub) screen:
1. Draw backdrop with depth 0 (depth test disabled)
- this effectively 'clears' the screen.
2. Draw sprites with depth appropriate depth values (3, 6, 9, 12), but they are drawn based on the sprite priority register $2102/3 (depth testing is disabled)
- this should effectively draw the sprite layer correctly but without the additional sprite layer texture
3. Draw backgrounds 0 to 3 with the appropriate depth values depending on the BG modes so that they will positioned correctly (depth testing is enabled)
This eliminates the need to have a separate sprite layer texture. Another reason I didn't have the sprite layer texture because I couldn't figure out how to extract the pixels that belonged to a certain depth. Furthermore, the benefit of this method I used was that it eliminated the need for deferred drawing of tiles, or extracting and redrawing the separate sprite layers later. So some games actually benefited from the modest performance improvement.
But there's probably some flaw somewhere in the priorities that isn't totally right yet...
Just for completeness, the problem of this method was the color math. But managed to find some tricks using the alpha channel to make it all work together. The idea was that the sub-screen textures always have 1.0 as the alpha - actually it doesn't matter because I do not use the sub screen's alpha for alpha blending. But it's the pixels in the main screen textures can either have 0.5 (for half color math), 1.0 (full color math), or 0.0039 (the smallest non-zero alpha for disabling color math). By setting the GPU_SetAlphaBlending with the correct formula (using the main screen's alpha as the destination alpha), and by doing a single pass of drawing the alpha-blended sub-screen to the main screen, we can reproduce most of the SNES color math logic rather efficiently. I'm surprised at how well it actually works on real 3DS hardware.
--------------------- MERGED ---------------------------
Frame drops:
Well, if you read my post, I said that the games lagged only and only when there's lost of sprites on screen, I didn't mentioned the special effects. even though this new update did fixed the transparency and parallax laggy screen issue, which again, is just too damn awesome! I am more worried about the (mini)fps drops with lots of sprites on screen, all the games I mentioned in my previous post run 100% perfectly in blargsnes, with no FPS drops; but in your emulator is still lagging a bit(different rendering engine, I presume), I am optimist that it can be fixed somehow.
@A Fireman - You mentioned BlargSnes runs 100% perfectly. You sure know how to give me a good challenge.
Sprites are indeed rendered differently in Snes9x. In fact, Snes9x renders sprites line-by-line (8x1 pixels), but I think BlargSnes (iirc) renders them in big 8x8 pixel tiles and that's probably why it's smoother. Let's see if I can do anything there... I've got some ideas.
Last edited by bubble2k16,













