DMA timing testing

Discussion in 'Supercard SDK' started by Nebuleon, Feb 4, 2013.

Feb 4, 2013

DMA timing testing by Nebuleon at 11:34 PM (951 Views / 0 Likes) 3 replies

  1. Nebuleon
    OP

    Member Nebuleon MAH BOI/GURL

    Joined:
    Dec 22, 2012
    Messages:
    900
    Country:
    Canada
    Wrote something with DMA functions provided by BassAceGold to test, for all CPU levels used in CATSFC:
    * whether DMA would be faster than memcpy;
    * whether smaller DMAs would be slower than bigger DMAs;
    * whether, with __dcache_writeback_all, memory copied correctly using DMA.

    The plugin and the source are in the same archive. This archive is an uncategorised filetrip upload to prevent it being shown in the Supercard category. This is mainly intended for SDK devs.

    Get this here: http://filetrip.net/view?8bInDFaaLP
     
  2. Nebuleon
    OP

    Member Nebuleon MAH BOI/GURL

    Joined:
    Dec 22, 2012
    Messages:
    900
    Country:
    Canada
    Processed timings. All copies were performed 4 times, averaging to obtain the result, on 32-byte aligned 512 KiB buffers. The code I use for alignment is in source/nds/dma_adj.c. All transfer accuracy tests pass.

    CPU level 6 (240 MHz)
    memcpy: 12,970 microseconds
    DMA 2-byte: 64,981 microseconds
    DMA 4-byte: 33,536 microseconds
    DMA 16-byte: 10,880 microseconds
    DMA 32-byte: 7,680 microseconds

    CPU level 9 (300 MHz)
    memcpy: 14,890 microseconds
    DMA 2-byte: 76,117 microseconds
    DMA 4-byte: 39,338 microseconds
    DMA 16-byte: 13,312 microseconds
    DMA 32-byte: 9,344 microseconds

    CPU level 10 (336 MHz)
    memcpy: 13,269 microseconds
    DMA 2-byte: 67,541 microseconds
    DMA 4-byte: 35,114 microseconds
    DMA 16-byte: 11,776 microseconds
    DMA 32-byte: 8,320 microseconds

    CPU level 11 (360 MHz)
    memcpy: 12,330 microseconds
    DMA 2-byte: 63,061 microseconds
    DMA 4-byte: 32,554 microseconds
    DMA 16-byte: 10,965 microseconds
    DMA 32-byte: 7,765 microseconds

    CPU level 12 (384 MHz)
    memcpy: 11,562 microseconds
    DMA 2-byte: 58,965 microseconds
    DMA 4-byte: 30,506 microseconds
    DMA 16-byte: 10,240 microseconds
    DMA 32-byte: 7,253 microseconds

    CPU level 13 (396 MHz)
    memcpy: 11,178 microseconds
    DMA 2-byte: 57,045 microseconds
    DMA 4-byte: 29,610 microseconds
    DMA 16-byte: 9,941 microseconds
    DMA 32-byte: 6,997 microseconds
     
  3. BassAceGold

    Member BassAceGold Testicles

    Joined:
    Aug 14, 2006
    Messages:
    494
    Country:
    Canada
    Pretty much sums up my suspicions. Glad to have hard numbers to back up the claims though.
     
  4. Nebuleon
    OP

    Member Nebuleon MAH BOI/GURL

    Joined:
    Dec 22, 2012
    Messages:
    900
    Country:
    Canada
    Yeah, and it's odd that levels 9 and 10 in the ordinary SDK work slower than 6. At least I validated the DMA optimisations though, and I will put them to good use!

    Also: An updated DMATest shows that the new function ds2_DMAcopy_8Bit is EXACTLY twice slower than ds2_DMAcopy_16Bit under all CPU levels.

    I'm not bothering to distribute that version though, just copy-paste the 16-bit version in DoSpeedTest and DoTransferAccuracyTest if you're interested.
     

Share This Page