As many of you have seen, I released the first kernel exploit developed by our team a few days ago. Quite a few people are just using it happily, but others have wanted to know how it works, so I decided to put together a few of these questions into a thread. I've mentioned that the kernel bug is in the OSDriver_CopyToSaveArea() function, but it's more complex than that. In Cafe OS, the device drivers for peripherals like the GPU, audio interface, DSP, and display controller are part of Cafe OS userspace libraries (like gx2.rpl, snd_core.rpl, dc.rpl). These drivers often need to store data that persists between processes, so the kernel has to provide cross-process memory areas for drivers. For this reason, the Cafe OS kernel lets you register an entity called an OSDriver, which has a cross-process memory buffer (called a save area). There are four functions you use to manipulate OSDriver structures. OSDriver_Register() allocates an OSDriver structure and adds it to a linked list of them. OSDriver_Deregister() frees a driver's save area and the driver structure, removing it from the list. OSDriver_CopyToSaveArea() copies data from userspace to an OSDriver's save area, allocating a save area if it doesn't exist. OSDriver_CopyFromSaveArea() copies data from a driver's save area to userspace. Since all 3 PPC cores may use this API at once, locking is needed to make sure only one core modifies the data at a time. The bug is that the spinlock protecting the OSDriver list (driver_ctxt_lock) is dropped before the actual copy from userspace starts. Once you actually reach the copy_in() call, no locks are held, allowing the driver to get deleted while the copy is taking place. Then whatever gets allocated on the heap in its place may get overwritten by the copy in progress. Doing this attack is nicer on the Wii U, since it has three cores. Given how all of the OSDriver functions work, we can construct an attack as follows. Let's say we allocate two drivers, DRVA and DRVB. Then the heap will look like this: [DRVA - 0x4c] [DRVB - 0x4c] [ Unallocated - 0x1000 ] Now we want to give DRVB a save area. It doesn't have one, but CopyToSaveArea() will allocate one if we copy in data of any size. Let's just copy in 4 bytes. Now the heap layout is: [DRVA - 0x4c] [DRVB - 0x4c] [DRVB save - 0x1000] Time to start copying in our payload, a full 0x1000 bytes. This will go into DRVB save. Meanwhile, on another CPU, start freeing DRVB. When the free completes, the heap looks like this: [DRVA - 0x4c] [ Unallocated - 0x104c ] But the copy is still going on inside unallocated space. Now we have to put something there. Let's give driver A a save area by copying in 4 to it. The heap will then be: [DRVA - 0x4c] [DRVA save - 0x1000] [ Unallocated - 0x4c ] If we get the timing right, the copy will still be inside DRVA save at this point. Now we allocate a third driver, DRVC. It will make the heap look like: [DRVA - 0x4c] [DRVA save - 0x1000] [DRVC - 0x4c] And with any luck, the copy will reach DRVC after it's allocated. We can overwrite the contents of the kernel's OSDriver structure, which includes the save area pointer. This means that we can set the save area of DRVC to a kernel address, and then copy kernel data in and out of userspace. We decided to pick the syscall table, and install the kern_read()/kern_write() calls. kern_read() and kern_write() force the kernel to read from and set the values at 32-bit pointers. They let you bypass kernel-only page protections, and write kernel data, but if you try to break other protections (like read-only) if will crash. That brings us to the next question. The kernel address table is a list of memory mappings, which the Cafe OS kernel feeds into its MMU at process startup. It's a bunch of 0x10-byte entries, a 4-tuple of (virtual address, range length, physical address, flags). By doing the math, you'll see that the kernel exploit is modifying entry 4 of the address table. This is a mapping of 0xA0000000, which spans 0x40000000 bytes. Originally, it looks like: Code: RAM:FFEAAA50 .long 0xA0000000 RAM:FFEAAA54 .long 0x40000000 RAM:FFEAAA58 .long 0 RAM:FFEAAA5C .long 0x2000 As it is, accessing that range would crash. Those kern_write() calls set the physical address to 0x31000000, and the flags to 0x28305800, which are the same as the 0x10000000 area's. So now we have 0xA0000000 mapping to 0x31000000 physical as RW memory. Since the loader and system libraries start at 0x32000000, and 0x01000000 maps to 0x32000000, you now have a writable version of 0x01000000 mirrored at 0xA1000000. The first step in porting the exploit to 5.3.2 was to get the race attack to succeed. Once the race attack worked, and we could write arbitrary kernel memory, we just tried all of the syscall table candidates until kern_read()/kern_write() worked. As for why there are multiple tables, I know that there's one for main apps and one for the loader, but I don't know what the others are for.