DMA timing testing

Discussion in 'Supercard SDK' started by Nebuleon, Feb 4, 2013.

  1. Nebuleon

    Nebuleon MAH BOI/GURL

    Dec 22, 2012
    Wrote something with DMA functions provided by BassAceGold to test, for all CPU levels used in CATSFC:
    * whether DMA would be faster than memcpy;
    * whether smaller DMAs would be slower than bigger DMAs;
    * whether, with __dcache_writeback_all, memory copied correctly using DMA.

    The plugin and the source are in the same archive. This archive is an uncategorised filetrip upload to prevent it being shown in the Supercard category. This is mainly intended for SDK devs.

    Get this here:
  2. Nebuleon

    Nebuleon MAH BOI/GURL

    Dec 22, 2012
    Processed timings. All copies were performed 4 times, averaging to obtain the result, on 32-byte aligned 512 KiB buffers. The code I use for alignment is in source/nds/dma_adj.c. All transfer accuracy tests pass.

    CPU level 6 (240 MHz)
    memcpy: 12,970 microseconds
    DMA 2-byte: 64,981 microseconds
    DMA 4-byte: 33,536 microseconds
    DMA 16-byte: 10,880 microseconds
    DMA 32-byte: 7,680 microseconds

    CPU level 9 (300 MHz)
    memcpy: 14,890 microseconds
    DMA 2-byte: 76,117 microseconds
    DMA 4-byte: 39,338 microseconds
    DMA 16-byte: 13,312 microseconds
    DMA 32-byte: 9,344 microseconds

    CPU level 10 (336 MHz)
    memcpy: 13,269 microseconds
    DMA 2-byte: 67,541 microseconds
    DMA 4-byte: 35,114 microseconds
    DMA 16-byte: 11,776 microseconds
    DMA 32-byte: 8,320 microseconds

    CPU level 11 (360 MHz)
    memcpy: 12,330 microseconds
    DMA 2-byte: 63,061 microseconds
    DMA 4-byte: 32,554 microseconds
    DMA 16-byte: 10,965 microseconds
    DMA 32-byte: 7,765 microseconds

    CPU level 12 (384 MHz)
    memcpy: 11,562 microseconds
    DMA 2-byte: 58,965 microseconds
    DMA 4-byte: 30,506 microseconds
    DMA 16-byte: 10,240 microseconds
    DMA 32-byte: 7,253 microseconds

    CPU level 13 (396 MHz)
    memcpy: 11,178 microseconds
    DMA 2-byte: 57,045 microseconds
    DMA 4-byte: 29,610 microseconds
    DMA 16-byte: 9,941 microseconds
    DMA 32-byte: 6,997 microseconds
  3. BassAceGold

    BassAceGold Testicles

    Aug 14, 2006
    Pretty much sums up my suspicions. Glad to have hard numbers to back up the claims though.
  4. Nebuleon

    Nebuleon MAH BOI/GURL

    Dec 22, 2012
    Yeah, and it's odd that levels 9 and 10 in the ordinary SDK work slower than 6. At least I validated the DMA optimisations though, and I will put them to good use!

    Also: An updated DMATest shows that the new function ds2_DMAcopy_8Bit is EXACTLY twice slower than ds2_DMAcopy_16Bit under all CPU levels.

    I'm not bothering to distribute that version though, just copy-paste the 16-bit version in DoSpeedTest and DoTransferAccuracyTest if you're interested.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice