- Joined
- Feb 10, 2009
- Messages
- 5,484
- Trophies
- 2
- Location
- Sacramento, California
- Website
- lazerlight.x10.mx
- XP
- 5,482
- Country
Oops... sorry I have been way too busy with work. I will be interested in knowing what you have discovered, although I wouldn't know if I have much time to implement. I'm still fixing up some stuff with the PicoDrive port.
Ok, I'll just mention it here. Through the use of luminous textures, reflective LUTs, and some configuring of texture combiners, we've been able to utilize a way to mimic paletted/indexed textures. Credit goes to StapleButter and those he collaborated with to get it initially working. And now I realize I did touch up on this a few years ago in this thread, but nothing was concrete back then, nor did various life-changing events occur prior to that which caused me to be out of the loop.
Now, from what StapleButter explained to me regarding the use of this, it does require a little bit more GPU time to pull this off vs regular textures because of how it's set up, but I figure that one way to relieve this is by not rendering from textures stored in FCRAM. Current methods call for roughly 2MB as a texture cache which for updating purposes would only be sensible in FCRAM, but by my calculations, we'd need only about 272KB of VRAM for holding the texture caches of each form (128KB for 2-bit as L4, 64KB for 4-bit as L4, 64KB for 8-bit and 11-bit direct color as L8, and 16KB for 8-bit Mode7 as L8). Each of these in FCRAM can technically be updated as PPU writes are done, then when examining for potential rendering spots, we'd need to only upload the sections from our FCRAM caches to VRAM caches, not the entirety of the caches, as the limits at least for backgrounds is a range of 1024 tiles. Because the locations of each tile will be static in the overall scheme of things, we can effectively create a mapping of the SNES VRAM in the form of a vbo, updating and uploading just like with texture data. I believe this can be done for both backgrounds and sprites.
For the Mode7 layer(s), only 1MB of total VRAM would be needed, and this would cover all the various forms. The previous idea we talked about long ago with the current design revolved around rendering the 4 corners in 512x512 sections because the 3DS doesn't like rendering a full-on 1024x1024 buffer. But, because each texel here will be 1-byte each, we could cheat by combining every 4 L8 texels as one RGBA8 texel (for both source and destination), giving us an effective 512x512 texel buffer to work with, rendering each 8x8 tile as 4x4 tiles. The Z-order form allows this. Because the source tiles will be in VRAM rather than FCRAM, the means of rendering to this bigger buffer will be faster too.
There is an issue though. Unless there's something I'm missing, there's no real way to render a slew of tiles with different palette indices in one go. For 8-bit it's no problem, but 2/4-bit can access up to 8 different palette arrangements, and they can't access different portions of the reflective LUTs. So, what would have to be done is rendering a set of tiles that use palette set 0 in the first go, palette set 1 in the second go, etc up to palette set 7 (assuming all 8 sets were used). For each set, the palette in the reflective LUTs would have to be changed. This is where having that mapping of SNES VRAM as a vbo would come in handy, because it means we wouldn't have to form a new list each time. The shaders and uniforms can handle trimming out those entries in the vbo that don't match the currently designated palette set. I'll give you the test program that I've been working on (let me clean it up first), and maybe you might have some insight that I missed that might remedy this issue.
Another issue is regarding offset-per-tile. The idea of a vbo is great for regular backgrounds, but offset-per-tile throws a wrench into it. My understanding of this particular feature is that at any particular pixel position that's meant to read a specific tile, the offset adjusts the source X and/or Y to read a possibly different tile, leading me to think that this affects a column of tiles rather than a single one. So, the thought was to have a second mapping of SNES VRAM which instead maps in columns vs rows.
The last issue (off the top of my head) is a combination of offset-per-tile and scanline manipulation. Even with that second vbo mapping for offset-per-tile, it's going to be a lot of commands to pull this off if background layers change positioning per scanline. This may require building a vbo each frame, which is technically what's being done right now for everything.
Last edited by DiscostewSM,