The reason the copying is necessary is because the sub engine can only use one of the 128k vram banks, bank C. I can't switch it out... the bank which is taking the snapshot needs to be allocated to the LCD memory area. The best I could do, without copying, was to have it displaying at 30 fps, with every other frame being black. No good. So the snapshot is captured to bank D, and every vblank, the contents of D are copied to C. I don't need to copy all of it, just 256x144x2 bytes (or possibly 160x144x2 noncontinuous bytes), but that's still a lot.
You really made me worry about being completely gone crazy... no, actually the fact is that you have to use bank C and background on odd frames and bank D and bitmap objects (sprites! really!!!) on even frames. What I'm not sure now is if rotscale double bitmap objects exist or not... I've never used them but it should be theoretically possible (ATTR0_ROTSCALE_DOUBLE|ATTR0_BMP).
So you'll have to break up the captured image into sprites and resize them. Not easy. Also, you have to make sure you're using the proper "Bitmap OBJ Mapping (DISPCNT.6,5,22)"
I believe it's easier to resize backgrounds and sprites directly on main engine and forget capture