Homebrew SNES9x for Old 3DS

DiscostewSM · Mar 17, 2018

bubble2k16 said:
Oops... sorry I have been way too busy with work. I will be interested in knowing what you have discovered, although I wouldn't know if I have much time to implement. I'm still fixing up some stuff with the PicoDrive port.

Ok, I'll just mention it here. Through the use of luminous textures, reflective LUTs, and some configuring of texture combiners, we've been able to utilize a way to mimic paletted/indexed textures. Credit goes to StapleButter and those he collaborated with to get it initially working. And now I realize I did touch up on this a few years ago in this thread, but nothing was concrete back then, nor did various life-changing events occur prior to that which caused me to be out of the loop.

Now, from what StapleButter explained to me regarding the use of this, it does require a little bit more GPU time to pull this off vs regular textures because of how it's set up, but I figure that one way to relieve this is by not rendering from textures stored in FCRAM. Current methods call for roughly 2MB as a texture cache which for updating purposes would only be sensible in FCRAM, but by my calculations, we'd need only about 272KB of VRAM for holding the texture caches of each form (128KB for 2-bit as L4, 64KB for 4-bit as L4, 64KB for 8-bit and 11-bit direct color as L8, and 16KB for 8-bit Mode7 as L8). Each of these in FCRAM can technically be updated as PPU writes are done, then when examining for potential rendering spots, we'd need to only upload the sections from our FCRAM caches to VRAM caches, not the entirety of the caches, as the limits at least for backgrounds is a range of 1024 tiles. Because the locations of each tile will be static in the overall scheme of things, we can effectively create a mapping of the SNES VRAM in the form of a vbo, updating and uploading just like with texture data. I believe this can be done for both backgrounds and sprites.

For the Mode7 layer(s), only 1MB of total VRAM would be needed, and this would cover all the various forms. The previous idea we talked about long ago with the current design revolved around rendering the 4 corners in 512x512 sections because the 3DS doesn't like rendering a full-on 1024x1024 buffer. But, because each texel here will be 1-byte each, we could cheat by combining every 4 L8 texels as one RGBA8 texel (for both source and destination), giving us an effective 512x512 texel buffer to work with, rendering each 8x8 tile as 4x4 tiles. The Z-order form allows this. Because the source tiles will be in VRAM rather than FCRAM, the means of rendering to this bigger buffer will be faster too.

There is an issue though. Unless there's something I'm missing, there's no real way to render a slew of tiles with different palette indices in one go. For 8-bit it's no problem, but 2/4-bit can access up to 8 different palette arrangements, and they can't access different portions of the reflective LUTs. So, what would have to be done is rendering a set of tiles that use palette set 0 in the first go, palette set 1 in the second go, etc up to palette set 7 (assuming all 8 sets were used). For each set, the palette in the reflective LUTs would have to be changed. This is where having that mapping of SNES VRAM as a vbo would come in handy, because it means we wouldn't have to form a new list each time. The shaders and uniforms can handle trimming out those entries in the vbo that don't match the currently designated palette set. I'll give you the test program that I've been working on (let me clean it up first), and maybe you might have some insight that I missed that might remedy this issue.

Another issue is regarding offset-per-tile. The idea of a vbo is great for regular backgrounds, but offset-per-tile throws a wrench into it. My understanding of this particular feature is that at any particular pixel position that's meant to read a specific tile, the offset adjusts the source X and/or Y to read a possibly different tile, leading me to think that this affects a column of tiles rather than a single one. So, the thought was to have a second mapping of SNES VRAM which instead maps in columns vs rows.

The last issue (off the top of my head) is a combination of offset-per-tile and scanline manipulation. Even with that second vbo mapping for offset-per-tile, it's going to be a lot of commands to pull this off if background layers change positioning per scanline. This may require building a vbo each frame, which is technically what's being done right now for everything.

bubble2k16 · Mar 18, 2018

DiscostewSM said:
Ok, I'll just mention it here. Through the use of luminous textures, reflective LUTs, and some configuring of texture combiners, we've been able to utilize a way to mimic paletted/indexed textures. Credit goes to StapleButter and those he collaborated with to get it initially working. And now I realize I did touch up on this a few years ago in this thread, but nothing was concrete back then, nor did various life-changing events occur prior to that which caused me to be out of the loop.

Now, from what StapleButter explained to me regarding the use of this, it does require a little bit more GPU time to pull this off vs regular textures because of how it's set up, but I figure that one way to relieve this is by not rendering from textures stored in FCRAM. Current methods call for roughly 2MB as a texture cache which for updating purposes would only be sensible in FCRAM, but by my calculations, we'd need only about 272KB of VRAM for holding the texture caches of each form (128KB for 2-bit as L4, 64KB for 4-bit as L4, 64KB for 8-bit and 11-bit direct color as L8, and 16KB for 8-bit Mode7 as L8). Each of these in FCRAM can technically be updated as PPU writes are done, then when examining for potential rendering spots, we'd need to only upload the sections from our FCRAM caches to VRAM caches, not the entirety of the caches, as the limits at least for backgrounds is a range of 1024 tiles. Because the locations of each tile will be static in the overall scheme of things, we can effectively create a mapping of the SNES VRAM in the form of a vbo, updating and uploading just like with texture data. I believe this can be done for both backgrounds and sprites.

For the Mode7 layer(s), only 1MB of total VRAM would be needed, and this would cover all the various forms. The previous idea we talked about long ago with the current design revolved around rendering the 4 corners in 512x512 sections because the 3DS doesn't like rendering a full-on 1024x1024 buffer. But, because each texel here will be 1-byte each, we could cheat by combining every 4 L8 texels as one RGBA8 texel (for both source and destination), giving us an effective 512x512 texel buffer to work with, rendering each 8x8 tile as 4x4 tiles. The Z-order form allows this. Because the source tiles will be in VRAM rather than FCRAM, the means of rendering to this bigger buffer will be faster too.

There is an issue though. Unless there's something I'm missing, there's no real way to render a slew of tiles with different palette indices in one go. For 8-bit it's no problem, but 2/4-bit can access up to 8 different palette arrangements, and they can't access different portions of the reflective LUTs. So, what would have to be done is rendering a set of tiles that use palette set 0 in the first go, palette set 1 in the second go, etc up to palette set 7 (assuming all 8 sets were used). For each set, the palette in the reflective LUTs would have to be changed. This is where having that mapping of SNES VRAM as a vbo would come in handy, because it means we wouldn't have to form a new list each time. The shaders and uniforms can handle trimming out those entries in the vbo that don't match the currently designated palette set. I'll give you the test program that I've been working on (let me clean it up first), and maybe you might have some insight that I missed that might remedy this issue.

Another issue is regarding offset-per-tile. The idea of a vbo is great for regular backgrounds, but offset-per-tile throws a wrench into it. My understanding of this particular feature is that at any particular pixel position that's meant to read a specific tile, the offset adjusts the source X and/or Y to read a possibly different tile, leading me to think that this affects a column of tiles rather than a single one. So, the thought was to have a second mapping of SNES VRAM which instead maps in columns vs rows.

The last issue (off the top of my head) is a combination of offset-per-tile and scanline manipulation. Even with that second vbo mapping for offset-per-tile, it's going to be a lot of commands to pull this off if background layers change positioning per scanline. This may require building a vbo each frame, which is technically what's being done right now for everything.

Any benchmarks on the performance? How about in-frame palette changes?

Actually I am very interested in this, but not because of emulating the SNES but other 3D consoles that also uses 4 / 8-bit paletted textures.

I was thinking about moving to emulating another platform, and I think this will be an excellent method to emulate those paletted textures. The performance gain over manually rebuilding each texture is probably much more than the SNES platform.

Deleted User · Mar 18, 2018

bubble2k16 said:
Any benchmarks on the performance? How about in-frame palette changes?

Actually I am very interested in this, but not because of emulating the SNES but other 3D consoles that also uses 4 / 8-bit paletted textures. I was thinking about moving to emulating another platform, and I think this will be an excellent method to emulate those paletted textures. The performance gain over manually rebuilding each texture is probably much more than the SNES platform.

And what platform would that be?

MattKimura · Mar 18, 2018

Strange, I'm able to use Snes9x normally again like I used to. Save states take a second and SRAM writes don't disturb gameplay. I haven't done anything to my SD card. All I did was delete the snes9x config from the root, and the one located in the 3DS folder.
This emulator is flawless, able to be fullscreen and boost the audio. I had a blast playing zelda alttp randomizer!

uyjulian · Mar 18, 2018

MattKimura said:
Strange, I'm able to use Snes9x normally again like I used to. Save states take a second and SRAM writes don't disturb gameplay. I haven't done anything to my SD card. All I did was delete the snes9x config from the root, and the one located in the 3DS folder.
This emulator is flawless, able to be fullscreen and boost the audio. I had a blast playing zelda alttp randomizer!

It appears that that created space near the start of the SD card.

DiscostewSM · Mar 19, 2018

bubble2k16 said:
Any benchmarks on the performance? How about in-frame palette changes?

Actually I am very interested in this, but not because of emulating the SNES but other 3D consoles that also uses 4 / 8-bit paletted textures. I was thinking about moving to emulating another platform, and I think this will be an excellent method to emulate those paletted textures. The performance gain over manually rebuilding each texture is probably much more than the SNES platform.

Haven't done any benchmarking. Can't even tell you how it works for mid-frame stuff because it's not incorporated into the emu at all. It's all theory on what I understand of the 3ds. Changing the palette I believe would require to flush what's currently being rendered so as to not cause corruption, but I don't recall how significant of a delay that would cause overall.

For other emus you're thinking of, a few notes. Linear filtering does not work properly because rather than blending adjacent fragment colors, it blends the adjacent indices of the texture then uses that to reference the fragment color, which can totally be off. Using L4 for 16 colors requires that you write into spots starting at 0x00, then increment every 0x11 (0x00, 0x11, 0x22, ... , 0xEE, 0xFF). With that, you have to XOR 0x80 to each texel because entry 0x80 in the LUT is always black (because the normal length equals 0), so it's the only reasonable spot for the transparency bit. For 4-bit, you XOR 0x88 to the overall byte that consists of 2 texels. This could be a problem if the platform you're thinking can utilized indexed textures without culling out index 0 and use a color other than black for that.

Had a lot going on lately, but I'll still try to get you that test program I made. If I understood Snes9x's design, then I'd have tried implementing this myself by now. But for my idea, various changes to the design would have to happen, since my idea consists of rendering the backgrounds and sprite layer separately, then mixing those layers together afterwards.

bubble2k16 · Mar 19, 2018

DiscostewSM said:
Haven't done any benchmarking. Can't even tell you how it works for mid-frame stuff because it's not incorporated into the emu at all. It's all theory on what I understand of the 3ds. Changing the palette I believe would require to flush what's currently being rendered so as to not cause corruption, but I don't recall how significant of a delay that would cause overall.

For other emus you're thinking of, a few notes. Linear filtering does not work properly because rather than blending adjacent fragment colors, it blends the adjacent indices of the texture then uses that to reference the fragment color, which can totally be off. Using L4 for 16 colors requires that you write into spots starting at 0x00, then increment every 0x11 (0x00, 0x11, 0x22, ... , 0xEE, 0xFF). With that, you have to XOR 0x80 to each texel because entry 0x80 in the LUT is always black (because the normal length equals 0), so it's the only reasonable spot for the transparency bit. For 4-bit, you XOR 0x88 to the overall byte that consists of 2 texels. This could be a problem if the platform you're thinking can utilized indexed textures without culling out index 0 and use a color other than black for that.

Had a lot going on lately, but I'll still try to get you that test program I made. If I understood Snes9x's design, then I'd have tried implementing this myself by now. But for my idea, various changes to the design would have to happen, since my idea consists of rendering the backgrounds and sprite layer separately, then mixing those layers together afterwards.

Linear filtering isn't too much of a concern to me. I was wondering if you think implementing lighting or per-vertex colouring is possible with this? After all, I think that the console I am looking at requires stuff like shading.

Yes I can see how the underlying design for all the drawing will have to change, but I think I'm really quite done with Snes9x for 3DS, except for minor fixes here and there.

If you are keen in tinkering Snes9x for 3DS to implement this, do let me know.

I could point the way.

MKKhanzo · Mar 20, 2018

Leafgreen26 said:
And what platform would that be?

GBA is my bet!

bubble2k16 · Mar 20, 2018

MattKimura said:
Strange, I'm able to use Snes9x normally again like I used to. Save states take a second and SRAM writes don't disturb gameplay. I haven't done anything to my SD card. All I did was delete the snes9x config from the root, and the one located in the 3DS folder.
This emulator is flawless, able to be fullscreen and boost the audio. I had a blast playing zelda alttp randomizer!

Glad you liked it.

Leafgreen26 said:
And what platform would that be?

Hmm... I haven't started work on anything yet... so I don't want to make promises yet.

Zeroexe90 · Mar 21, 2018

Hi!, I have a userland issue ... when I exit the emulator, the upper screen turns black, the lower one stays red and they stay frozen until I turn off the 3ds.
I'm using N3DS with steelhax, 11.6, it also happens to me in virtuanes

Thanks for all the work

Deleted User · Mar 21, 2018

Zeroexe90 said:
Hi!, I have a userland issue ... when I exit the emulator, the upper screen turns black, the lower one stays red and they stay frozen until I turn off the 3ds.
I'm using N3DS with steelhax, 11.6, it also happens to me in virtuanes

Thanks for all the work

Follow this guide completely and you shouldn't get that error: https://3ds.hacks.guide/ CFW is way more reliable than hax.

Zeroexe90 · Mar 21, 2018

Leafgreen26 said:
Follow this guide completely and you shouldn't get that error: https://3ds.hacks.guide/ CFW is way more reliable than hax.

Thanks for the reply! In fact, I had another n3ds with boot9strap, but it was banned in the last banwave, so I wanted to have a clean one to play online with no risks.

The post is to inform the error that happened to me, but then ... is it because the hax are more unstable?

Deleted User · Mar 21, 2018

Zeroexe90 said:
Thanks for the reply! In fact, I had another n3ds with boot9strap, but it was banned in the last banwave, so I wanted to have a clean one to play online with no risks.

The post is to inform the error that happened to me, but then ... is it because the hax are more unstable?

Yes, but CFW also does a lot of things behind the scene as well. Here is a quick read since i don't want to misinform you: https://github.com/AuroraWright/Luma3DS/wiki

bubble2k16 · Mar 24, 2018

Never thought I'd be making such a major change to this emu anymore, but here it is:

v1.30 released:

Improved sound synchronization.
Added BlargSNES DSP Core (experimental) for performance. The original Snes9X DSP core (default) suffers from sound skipping in some games like Aladdin and Gradius 3. You can choose which DSP core to use from the Options menu. The BlargSNES DSP Core sounds similar to the Snes9x Core for most games. Sounds like Final Fantasy 3's the howling wind sound different.
Added support for Tengai Makyou Zero English Patch (for hopefully all future versions)

Download from:
https://github.com/bubble2k16/snes9x_3ds/releases/download/v1.30/snes9x_3ds-v1.30.zip

------------------------

Just some background. Apparently, I only planned to fix the emu to allow all versions (including future ones) of the Tengai Makyou Zero English patch to run. But it turns out that by recompiling the emulator, it turns out that the sound emulation became slow, causing the sound in some parts in a very small number of games to stutter. Notable ones are: Gradius 3's start screen (fanfare when I press the Start button), and Aladdin (fanfare when I reach the end of a stage).

I found that quite unacceptable, so I decided to very quickly to port the BlargSNES's DSP core into this emulator for performance. Earlier I thought it was rather impossible, but I was thinking too hard then. It turns out the integration wasn't as complicated as I thought. But there were some bugs I had to fix, which I managed to get it done in a day or two, and then I spent another day or two for testing.

The emulator still starts up using the default Snes9x DSP core for sound emulation, but you can change to use BlargSNES's DSP for all or specific games.

Craftyawesome · Mar 24, 2018

Kind of out of the loop, is this more accurate than the snes9x 2005 plus retroarch core?

bubble2k16 · Mar 24, 2018

Craftyawesome said:
Kind of out of the loop, is this more accurate than the snes9x 2005 plus retroarch core?

Hmm... this is a little less accurate that due to various optimizations specific for the old 3DS.

DiscostewSM · Mar 24, 2018

Cool @bubble2k16

Regarding your mention of FF3's wind howling. This is something to do with Pulse Modulation, which provides a "frequency sweep" to a channel by taking the amplitude of the previous channel to use as a pitch modifier (not channel 0 because there is no previous channel before it to take from). It doesn't sound correct with the BlargSNES DSP core because it's not used in the "master" branch of BlargSNES. It was commented out. I know because I was the one trying to implement it long ago when I was active in BlargSNES's development. I recall that in the "veryhard" branch it was used, but I don't know if I ever got it right, likely because it's having to buffer the audio in segments rather than processing them bit by bit. It was designed to run on the 2nd core to free up the main core, but as a result, games like Earthworm Jim 2 have audio issues that stream audio in. The "veryhard" branch does have other differences in the DSP core from the "master" branch besides this, like integrating the Noise Generator into assembly.

naaiil · Mar 24, 2018

bubble2k16 said:
Never thought I'd be making such a major change to this emu anymore, but here it is:

v1.30 released:

Improved sound synchronization.

Added BlargSNES DSP Core (experimental) for performance. The original Snes9X DSP core (default) suffers from sound skipping in some games like Aladdin and Gradius 3. You can choose which DSP core to use from the Options menu. The BlargSNES DSP Core sounds similar to the Snes9x Core for most games. Sounds like Final Fantasy 3's the howling wind sound different.

Added support for Tengai Makyou Zero English Patch (for hopefully all future versions)

i have issue when playing emerald dragon english patched v1.2. Game stuck but emulator is fine can go to menu, setting, etc..

how to produce :
new game - see text story - menu - config - text speed - game stuck

o2ds using latest snes9x 1.30

uyjulian · Mar 24, 2018

I'm curious if the 3DS DSP has been reverse-engineered so that sound rendering can occur on the DSP and not the syscore?

shakkar23 · Mar 24, 2018

i cant load this up on the O3ds. do i need CFW?

Homebrew SNES9x for Old 3DS

Well-Known Member

Well-Known Member

Deleted User

Guest

3DS & WiiU Enthusiast

Homebrewer

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Active Member

Deleted User

Guest

Active Member

Deleted User

Guest

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Active Member

Homebrewer

Well-Known Member

Similar threads

Popular threads in this forum