Introducing Lightrec, a MIPS-to-everything dynarec

Emulation is what got me into computer science to begin with, as I always thought that emulators are impressive pieces of software. The fact that we simulate a real-world electronic device is just amazing. What astounds me even more is that some emulators break the boundaries of what we thought was possible. Who remembers UltraHLE? Who ever tried Bleem!cast? As my knowledge of computer science increased for the last 12 years learning C, working on Linux and doing low-level programming on embedded systems, emulators slowly ceased to be a mystery to me; but that made me even more respectful now that I can grasp the genius that went into these craft pieces.

The biggest praise I have for emulator creators is that they don't follow the common premise that the solution is always better hardware. In a world where consumption is the key to our doom, I like to believe that we can always do more with less. Under constraints, people get creative. Writing software on infinitely powerful machines would be boring.

Introducing Lightrec

Since 2014, I've been working on-off on a project called Lightrec. Started as an experiment, to test my skills and improve my knowledge, it later became a fully working dynamic recompiler (aka. dynarec) for the PCSX Playstation emulator targetting a wide panel of host CPUs, thanks to the use of GNU Lightning as the code emitter.

Succeeding where others failed

The big disavantage of traditional dynamic recompilers is that they only target one architecture. PCSX has one dynamic recompiler for x86 PCs, another one for ARM-based smartphones, and yet another one for MIPS. Each new dynarec means a different code base, a different performance, a different compatibility.

Ever since projects like LLVM or libjit came out, several unrelated attempts have been made by different people to create a dynamic compiler that would use these technologies to support a lot of different CPUs. Unfortunately, they all failed, as they soon discovered that these technologies were really not well-suited to dynamic recompilers. The reason is that while they can generate well-optimized code at runtime, they were not designed to do so in a tight schedule. A game's frame time is generally of about 16ms, and the recompiler sometimes needs to execute thousands of pieces of code in that time frame, something that LLVM or libjit just cannot do.

GNU Lightning is different than the two aforementioned projects as it has a different scope. LLVM and libjit were designed for creating programming language compilers or fast interpreters, and as such have the concept of variables, which is a construct that all programming languages share, but not something that machine code has. Machine code manipulates registers.

GNU Lightning is better described as a code emitter. It offers you a finite number of virtual registers (the actual number depends on the architecture), and a programming API that closely ressembles the instruction set of MIPS processors. All it does, is translate each virtual instruction and virtual registers to the corresponding CPU instruction (or instructions) with the corresponding hardware registers. It doesn't perform any optimization (except very obvious and easy ones), and does not provide register allocation facilities either. Thanks to being that simple, it is extremely fast at generating code, and is well suited for a portable dynamic recompiler project, as it supports almost every CPU on which you'd ever want to run a Playstation emulator.

Implementation details

As you may have guessed by now, the Lightrec name is a fusion of GNU Lightning and recompiler, as it's what it really is. It could also be read as Light Recompiler and that wouldn't be wrong either.

From a compatibility standpoint, Lightrec is very compatible with only a handful of games showing glitches or bugs. Regarding performance, it was truely abysmal a couple of years ago, being slower than PCSX's interpreter. It is now a few times faster, thanks to a few tricks:

  • High-level optimizations.
    The MIPS code is first pre-compiled into a form of Intermediate Representation (IR). Basically, just a single-linked list of structures representing the instructions. On that list, several optimization steps are performed: instructions are modified, reordered, tagged; new meta-instructions can be added, for instance to tell the code generator that a certain register won't be used anymore.
  • Run-time profiling with a built-in interpreter.
    The first time the MIPS code will jump to a new address, Lightrec will emulate it with its built-in interpreter. The interpreter will then gather run-time information. For instance, whether a load/store will hit the BIOS area, the RAM, or a hardware register. The code generator will then use this information to generate direct read/writes to the emulated memories, instead of jumping to C for every call.

  • Lazy compilation.
    If the interpreter detects a block of code that would be very hard to compile properly (e.g. a branch with a branch in its delay slot), the block is marked as not compilable, and will always be emulated with the interpreter. This allows to keep the code emitter simple and easy to understand.

  • Threaded compilation.
    The code generator can optionally run in a different thread of execution. Instead of compiling a block of code right when we jump to it, Lightrec can add it to the working queue of the threaded compiler, and emulate the block of code using the interpreter in the meantime. This greatly reduces stutter in the games when a lot of code is being recompiled, as the main execution thread doesn't wait anymore for the compilation process to finish.

  • Fast code LUT.
    Coming from psx4all's mipsrec dynarec, the function block Look-Up Table (LUT) is now a huge array of the size of the Playstation's RAM, 2 MiB. It makes it extremely fast to obtain a pointer to generated code from its MIPS address, and extremely easy to mark a block of code as outdated - the generated code just writes NULL to the corresponding offset.

Big-Ass Debugger

The tool I developped that helped build this dynarec from the ground up is called the Big-Ass Debugger. The name comes from the fact that it doesn't try to do anything smart: it runs the interpreter and the dynarec in parallel, and every time a block of code is executed, it will calculate a hash of all the registers and the whole RAM, thousands of times per frame, in the two instances of the emulator, and compare the results. It is a slow process, but if a difference is found, emulation stops and the debugger reports what exactly has gone wrong, and where it went wrong. This tool is what allowed me, from a state where the code emitted for all MIPS instructions were calls to PCSX's interpreter, to write the dynarec progressively, instruction after instruction, while still making sure that my code was fully working and compliant with the expected behaviour shown by the interpreter. To this day, I still use it to verify each optimization and improvement made to the dynarec.

Projects using Lightrec

So far Lightrec has been plugged into a few different emulators:

  • PCSX-ReArmed, which is the emulator I've been using for developing Lightrec. Not the fastest, since the dynarec exits after each piece of recompiled code; but it supports the Big-Ass Debugger.
  • pcsx4all, which is the fastest for various reasons: the dynarec doesn't return as often to the main loop, and the BIOS/scratchpad/RAM and RAM mirror memories are memory-mapped to locations that are a much better fit for the generated code.
  • Beetle, which is a libretro core based on Mednafen. The Lightrec integration is much more recent and still incomplete, but it already is a strong contender to replace the slow interpreter that Beetle has been using since the beginning.

Future

As it is now, the dynarec is already working really well and ready for prime time. Of course, it still has ways to go; I already have ideas about advanced optimizations (or should I say optimizations senquack suggested) but all the "easy" optimizations have already been done, and the benefit-over-work-needed ratio is getting smaller and smaller. Also, the fact that it's been plugged into Beetle means that we may start seeing it running on all libretro-supported platforms, which is something I definitely look forward to.

Overall, it's been a challenging project and I'm glad that I could take it to a state where it is usable.

Till next time!

  1. #1TonyJih, 03 Nov 2019

    This is truly amazing, excellent job !!

    Reply

  2. #2Slaanesh, 21 Jul 2020

    “Under constraints, people get creative.”

    My thoughts precisely and exactly why I still enjoy developing for the likes of the Dingoo A320. A nice, but limited set of hardware to play with. How much can I get out of it?

    Anyway, nice work and a good read. I bet you really enjoyed the process of getting this going!

    Reply