AXPro - new WiiU audio driver for RetroArch
--------------------------------------------------

Key features:

* multi threaded by design
  Driver is using a dedicated WiiU CPU core (build time define, set to core 0 at the moment) for audio system
  This allows for more offloading main core (always core 1) and for truly parallel audio generation, 
  processing and output (best observable with scummvm core at the moment)

* using hardware resampler
  We can now set the audio driver sample rate to match the emulator core sample rate (e.g. 44100 for ps1 and scummvm).
  Hardware will be used to convert to actual WiiU output rate of 48000, and emu core can just output its original data to RA audio driver.
  This avoids using RA software resampler (when WIIU_AUDIO_OPTIMIZATION_LEVEL is set to 3 for example)

* Nice audio streaming written from scratch for WiiU.
  Trying to minimize clicks/pops and honoring user set latency and output sample rate.
  Minimal latency is as low as 12ms. Maximum latency set by user, default is 64 ms.
  Optimal processing without any busy-loop waits. 
  In 'poll mode' (only scummvm at the moment) no extra copies of audio data - 
  scummvm writes directly to locked cache which is DMA-ed to memory and passed over to DSP. 
  Automatic simple rate control underflow amortization shaves off some clicks 
  by shortly slowing down playback (unnoticeable) to avoid buffer under-runs when possible.

* Builds with latest (wut) and older devkitPPC RA is currently using.
  Requires a minimal additional imports and function prototype definitions
  (check changes in RA imports.h and defines in wiiu_audio2.h)


Building
--------------------------------------------------
There is a new switch which should be passed to make when building emu core and retroarch.
WIIU_AUDIO_OPTIMIZATION_LEVEL=n where n is 0 (default, used when option not passed), 1, 2 or 3

0 - no changes needed to RA nor emu core code, normal RA audio driver, can dynamically change old/new wiiu audio drivers
1 - lite optimization on RA side, allowing for bit more parallel processing, can dynamically change old/new wiiu audio drivers
2 - removing RA software resampler without changing core code, old audio driver will have the wrong pitch, no RA features like rewind, record, audio rate control
3 - removing RA software resampler with small change in core code, old audio driver will have the wrong pitch, no RA features like rewind, record, audio rate control

Please refer to /RetroArch/audio/drivers/wiiu_audio2.h for details.
I suggest using optimization level 3 and adding one line to emulator core (axpro_audio_wait_fence_core). 
Core needs to wait for audio driver to finish reading from pointer last passed to audio driver write, 
just before core will write to this buffer again for the new frame. 
Take a look at pcsx rearmed changes for example.

To fully utilize the truly parallel audio processing potential, core can use 
the new audio driver extension (axpro_audio_set_thread_callback) to assign a core callback 
which will get called with provided buffer for writing audio data, directly from audio driver thread. 
This is now used with scummvm, greatly improving performance.


Minimal wiiu pthreads support – work in progress, testing with pcsx_rearmed
--------------------------------------------------
I’ve also added a minimal pthreads wrapper to be used for wiiu builds. 
One can now build the pcsx_rearmed using make switch LIGHTREC_THREADED_COMPILER=1. 
It builds, but it hangs when loading content. I haven’t even tried debugging it.
It might be a good idea to invest time in making RA threads work for WiiU too.
Threaded SPU on the other hand does work, but I haven’t noticed any speed improvements using it.
If someone is willing to do some work with it, please take a look at my pcsx changes. 
One needs to include pthread_minimal_wiiu.h instead of pthread.h. 
Choose on which core should the thread run when creating new threads (via non portable very simple thread attr interface).

UPDATE May 2024: minimal pthread proof of concept is working with PCSX ReARMed now. 
Take a look at latest pcsx and retroarch patches (fastmutex).
PCSX threaded rendering should now help with some games. 
With others it will make the emulation worse. 
Threaded recompiler and threaded SPU are also available, not sure how much this helps though.



Roadmap
--------------------------------------------------
I’d love to see pcsx_rearmed threaded recompiler in action. 
Experts on PowerPC memory ordering willing to take a look at this code could really help here.
RA threads would also be nice, this might work without too much hassle.
If we’ll stick with optimization levels 0 or 1, paired singles instruction set can be used
to speed up *_naive methods in driver converting audio samples int16 <-> float and adapting format.

scummvm would benefit from more changes mimicing wii port. E.g. thread for timer manager, 
getting rid of libco usage (via RA threading or custom implementation only running RA when menu activated)
Also, a faster video output like the one used with xbgr_1555 for pcsx would help SVGA (and/or 32 bpp) games still running slow.
I haven't check how is video output done for Wii, it might be worth checking out.
I had a bunch ideas what to improve here....maybe using shader with fast data streaming 
for palletted 256 color modes could make a real splash there.
Also, a standalone scummvm for wiiu, built using WUT sounds tempting.


