Hacking DMA timing testing

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Reaction score
470
Trophies
0
XP
461
Country
Canada
Wrote something with DMA functions provided by BassAceGold to test, for all CPU levels used in CATSFC:
* whether DMA would be faster than memcpy;
* whether smaller DMAs would be slower than bigger DMAs;
* whether, with __dcache_writeback_all, memory copied correctly using DMA.

The plugin and the source are in the same archive. This archive is an uncategorised filetrip upload to prevent it being shown in the Supercard category. This is mainly intended for SDK devs.

Get this here: http://filetrip.net/view?8bInDFaaLP
 
Processed timings. All copies were performed 4 times, averaging to obtain the result, on 32-byte aligned 512 KiB buffers. The code I use for alignment is in source/nds/dma_adj.c. All transfer accuracy tests pass.

CPU level 6 (240 MHz)
memcpy: 12,970 microseconds
DMA 2-byte: 64,981 microseconds
DMA 4-byte: 33,536 microseconds
DMA 16-byte: 10,880 microseconds
DMA 32-byte: 7,680 microseconds

CPU level 9 (300 MHz)
memcpy: 14,890 microseconds
DMA 2-byte: 76,117 microseconds
DMA 4-byte: 39,338 microseconds
DMA 16-byte: 13,312 microseconds
DMA 32-byte: 9,344 microseconds

CPU level 10 (336 MHz)
memcpy: 13,269 microseconds
DMA 2-byte: 67,541 microseconds
DMA 4-byte: 35,114 microseconds
DMA 16-byte: 11,776 microseconds
DMA 32-byte: 8,320 microseconds

CPU level 11 (360 MHz)
memcpy: 12,330 microseconds
DMA 2-byte: 63,061 microseconds
DMA 4-byte: 32,554 microseconds
DMA 16-byte: 10,965 microseconds
DMA 32-byte: 7,765 microseconds

CPU level 12 (384 MHz)
memcpy: 11,562 microseconds
DMA 2-byte: 58,965 microseconds
DMA 4-byte: 30,506 microseconds
DMA 16-byte: 10,240 microseconds
DMA 32-byte: 7,253 microseconds

CPU level 13 (396 MHz)
memcpy: 11,178 microseconds
DMA 2-byte: 57,045 microseconds
DMA 4-byte: 29,610 microseconds
DMA 16-byte: 9,941 microseconds
DMA 32-byte: 6,997 microseconds
 
Yeah, and it's odd that levels 9 and 10 in the ordinary SDK work slower than 6. At least I validated the DMA optimisations though, and I will put them to good use!

Also: An updated DMATest shows that the new function ds2_DMAcopy_8Bit is EXACTLY twice slower than ds2_DMAcopy_16Bit under all CPU levels.

I'm not bothering to distribute that version though, just copy-paste the 16-bit version in DoSpeedTest and DoTransferAccuracyTest if you're interested.
 

Site & Scene News

Popular threads in this forum