How fast are CPU emulators ?

vdwjeremy

Member
OP
Newcomer
Joined
Mar 3, 2021
Messages
8
Trophies
0
Age
38
XP
54
Country
France
hello,
recently I've been trying to evaluate the performances of emulators, more specifically on the CPU part. Some benchmarks are available on the web for very specific instructions, and on the other end what FPS can be expected per game on a given emulator/hardware, but not much on the performances of typical workloads and a comparison with the equivalent built for the host architecture, so I thought I could setup a benchmark test and share my findings.

Emulators are notably hard on hardware, especially now that the gap between generation is reducing, at least on the single thread performances. And the democratization of SoCs where the performances are quite constrained push for more optimization.


How CPU emulators are working

When a software is built for a given target hardware, the binary follows a given ISA, ARM for the 3DS for instance. If we want to run this software on another architecture, typically X86_64, all the instructions must be translated in a form the host can understand.
The oldest (slower) CPUs are often emulated by an interpreter as simplier and easier to guarantee timing accuracy, emulation for more recent CPUs use Just In Time recompilers to generate equivalent CPU instructions on the fly, with as few overhead as possible.
Communication with the rest of the system depends on the platform, when running on bare metal (no OS) like the Wii, the software usually reads/writes directly from/to hardware registers mapped to known locations in memory, these read/writes must be intercepted to simulate the rest of the system. On platforms using a kernel like PS4/Switch or Linux, the software is running in user mode and issues supervisor calls to interact with the kernel, these are special instructions that must intercepted and the OS behaviour simulated.

Testing methodology

For the following we'll assume a host system in X86_64 (Intel i5-4590), and an emulated system in ARM 32 bits (raspberry pi, 3DS).
To be able to compare native binary with emulated binary on various emulators, we'll need to be able to compile a source code against the 2 architectures, so this will be a systhetic benchmark which will of course have its own biais but that we can control. The benchmarking algorithm are
  • a prime numbers finder: heavy integer operations
  • a fractal image computation: heavy floating point operations and memory access
Both output a value that can be used to validate the correctness of the emulation, source: https://github.com/vdwjeremy/jit-bench/blob/main/src/tester.cpp
The test is run under Ubuntu 18, using cross compilation:
  • tester.cpp -- g++ --> tester_x86
  • tester.cpp -- arm-linux-gnueabi-g++ --> tester_arm
Changing ISA and GNU ABI that way would normally require to either intercept all calls to libc or emulate system calls as ARM and X86 don't even follow the same numbering, this is what QEMU is doing in user mode, but for this hobby project we can avoid this step with the following constrainst:
  • no dynamic allocation on heap, the program must use only the stack
  • no input/output as part of the benchmark loop
  • isolate the benchmark algorithm in a separate function, OS related task (initialization, input, output) must be kept in __libc_start_main/main, this separate function will be the entry point for the emulator
We need to reimplement the ELF loader as the targetted emulators expect a memory image ready for execution, however we can simplify it by compiling in static mode without relocation, which eliminates the need to take care of all the dynamic aspects.

The tested emulators are Unicorn (CageTheUnicorn, Angr) and Dynarmic (Citra, Yuzu), both compiled in release mode from github master branch on 2021/03/01. The result numbers include the compilation time (JIT) however they are negligible compared to the run time due to the small size of the code.

Results

benchmark elapse (lower is better).png

Conclusions

Even on a (relativelly) simple benchmark, dynamically recompiling emulators are still orders of magnitude slower than native code (18X to 195X for unicorn, 52X to 500X for dynarmic).
I was surprised to see that unicorn is significatly faster than dynarmic though, as I know Yuzu (ARM64) switched to dynarmic for better performance, if somebody is able to explain the discrepancy I would be curious to know.
Some loads such as floating point operations seems to have a bigger toll on the emulator than others, it would be interesting to have a deeper look at the trade-offs and architectural choices that have been made in the main emulators.


all sources can be found on https://github.com/vdwjeremy/jit-bench
 
General chit-chat
Help Users
  • SylverReZ @ SylverReZ:
    Hope they made lots of spaget
  • K3N1 @ K3N1:
    Chill dog
  • SylverReZ @ SylverReZ:
    Chilli dog
  • Skelletonike @ Skelletonike:
    Damn, I'm loving the new zelda.
  • xtremegamer @ xtremegamer:
    loving the new zelda, i started a game, it was so fucking good, so i
    am waiting on my friend to get home so we can start a new one together
  • Skelletonike @ Skelletonike:
    I just dislike that they don't let me choose the voices before the game starts. Happened with botw as well, had to change to japanese and restart.
  • K3N1 @ K3N1:
    But the important question is can you choose gender
  • Skelletonike @ Skelletonike:
    Same way you can choose Gerald's gender.
  • Skelletonike @ Skelletonike:
    *Geralt, damn autocorrect.
  • Psionic Roshambo @ Psionic Roshambo:
    But can he be trans? Lol
  • K3N1 @ K3N1:
    Zelda transforms into link
  • Psionic Roshambo @ Psionic Roshambo:
    Link I'm not the princess your looking for.... *Pulls a crying game*
  • K3N1 @ K3N1:
    *skirt up* it's exactly what I always wanted
  • Skelletonike @ Skelletonike:
    Just scanned all my zelda amiibos, took a while but didn't get anything that cool, did get the lon lon ranch hylian fabrics though.
  • Skelletonike @ Skelletonike:
    It was pretty funny when I scanned wolf link and got a shit load of meat.
  • K3N1 @ K3N1:
    @Skelletonike, btw I ran that custom for mgs4 on the deck I'm amazed it got that far in game
  • K3N1 @ K3N1:
    Plug in*
  • K3N1 @ K3N1:
    Your favorite activity
  • BentlyMods @ BentlyMods:
    My fav actvity is:

    mario-dancing.gif
  • Psionic Roshambo @ Psionic Roshambo:
    Do the Mario lol
  • K3N1 @ K3N1:
    🍑
  • K3N1 @ K3N1:
    Whoever developed Bramble was smoking that good shit fucking gnomes
    K3N1 @ K3N1: Whoever developed Bramble was smoking that good shit fucking gnomes