JPEG Decoder for GBA in 100% ASM

Discussion in 'GBA - Hardware, Devices and Utilities' started by DSLSC, Sep 30, 2013.

  1. DSLSC
    OP

    DSLSC Member

    Newcomer
    28
    3
    Aug 27, 2013
    United States
    I've always wanted to see just how fast the GBA can decode a full-screen JPEG picture. The JPEG decompression library in Jeff Frohwein's "gfxlib" is based on a decoder originally written for Visual Basic by Dmitry Brant. I used a combination of both versions to write an ASM version. It goes quite fast, but I think it can go faster with some optimizations by those who understand JPEG better.

    Currently, my ASM port only supports non-subsampled (1x1) and 2x2 subsampled JPEGs. I don't have any means of saving JPEGs in 2x1, 1x2, etc. formats, and so have left those formats out.

    The example binary flips through four JPEGs as fast as possible, including 1x1 and 2x2 JPEGs. The "mandrill.jpg" file is from another JPEG decompression library that takes 1/3 second to decode it. The remaining JPEGs are from a VB6 encoder port of a C-based encoder by Cristian Cuturicu.

    As typical, I have heavily commented the source code, in hopes of making it easy to understand and follow, and look forward to hearing about any optimizations, bugfixes, etc. Enjoy!
     

    Attached Files:

    enarky, mercluke and Coto like this.
  2. pasc

    pasc GBATemps official GBA Freak

    Member
    2,587
    145
    Sep 9, 2006
    Gambia, The
    Germany
    i <3 this :)

    Lovely how the GBA is still liked *sheds a tear*
     
  3. DSLSC
    OP

    DSLSC Member

    Newcomer
    28
    3
    Aug 27, 2013
    United States
    The GBA is the last Nintendo product with a physical link port, as far as I know, which is all the better for hardware interfacing. Look, an ARM CPU in a case with a built-in display and button pad!
    Next: another "first" with a JPEG encoder for the Parallax Propeller...in ASM!

    Again, if anyone can optimize this decoder further, or has any bugfixes, by all means post the changes.
    One optimization I made in the IDCT routine resulted in a very slight change to some hues of the output picture. Basically, the original code would ">>5" the values in the IDCT; then in the DrawPixel routine, the converted values would be ">>3"d. Well...why not do it all in a ">>8"? Which is what I changed, but with only 32 shades per channel, an error of one bit would probably be visible.