Hacking OSDriver kernel exploit - a technical description

Marionumber1 · Aug 20, 2015

As many of you have seen, I released the first kernel exploit developed by our team a few days ago. Quite a few people are just using it happily, but others have wanted to know how it works, so I decided to put together a few of these questions into a thread. I've mentioned that the kernel bug is in the OSDriver_CopyToSaveArea() function, but it's more complex than that.

How does the exploit work?

In Cafe OS, the device drivers for peripherals like the GPU, audio interface, DSP, and display controller are part of Cafe OS userspace libraries (like gx2.rpl, snd_core.rpl, dc.rpl). These drivers often need to store data that persists between processes, so the kernel has to provide cross-process memory areas for drivers. For this reason, the Cafe OS kernel lets you register an entity called an OSDriver, which has a cross-process memory buffer (called a save area).

There are four functions you use to manipulate OSDriver structures. OSDriver_Register() allocates an OSDriver structure and adds it to a linked list of them. OSDriver_Deregister() frees a driver's save area and the driver structure, removing it from the list. OSDriver_CopyToSaveArea() copies data from userspace to an OSDriver's save area, allocating a save area if it doesn't exist. OSDriver_CopyFromSaveArea() copies data from a driver's save area to userspace.

Since all 3 PPC cores may use this API at once, locking is needed to make sure only one core modifies the data at a time. The bug is that the spinlock protecting the OSDriver list (driver_ctxt_lock) is dropped before the actual copy from userspace starts. Once you actually reach the copy_in() call, no locks are held, allowing the driver to get deleted while the copy is taking place. Then whatever gets allocated on the heap in its place may get overwritten by the copy in progress. Doing this attack is nicer on the Wii U, since it has three cores.

Given how all of the OSDriver functions work, we can construct an attack as follows.

Let's say we allocate two drivers, DRVA and DRVB. Then the heap will look like this:
[DRVA - 0x4c] [DRVB - 0x4c] [ Unallocated - 0x1000 ]

Now we want to give DRVB a save area. It doesn't have one, but CopyToSaveArea() will allocate one if we copy in data of any size. Let's just copy in 4 bytes. Now the heap layout is:
[DRVA - 0x4c] [DRVB - 0x4c] [DRVB save - 0x1000]

Time to start copying in our payload, a full 0x1000 bytes. This will go into DRVB save. Meanwhile, on another CPU, start freeing DRVB. When the free completes, the heap looks like this:
[DRVA - 0x4c] [ Unallocated - 0x104c ]

But the copy is still going on inside unallocated space. Now we have to put something there. Let's give driver A a save area by copying in 4 to it. The heap will then be:
[DRVA - 0x4c] [DRVA save - 0x1000] [ Unallocated - 0x4c ]

If we get the timing right, the copy will still be inside DRVA save at this point. Now we allocate a third driver, DRVC. It will make the heap look like:
[DRVA - 0x4c] [DRVA save - 0x1000] [DRVC - 0x4c]

And with any luck, the copy will reach DRVC after it's allocated. We can overwrite the contents of the kernel's OSDriver structure, which includes the save area pointer. This means that we can set the save area of DRVC to a kernel address, and then copy kernel data in and out of userspace. We decided to pick the syscall table, and install the kern_read()/kern_write() calls.

What can kern_read() and kern_write() do?

kern_read() and kern_write() force the kernel to read from and set the values at 32-bit pointers. They let you bypass kernel-only page protections, and write kernel data, but if you try to break other protections (like read-only) if will crash. That brings us to the next question.

wj44 said:
@Marionumber1 @NWPlayer123
#define KERN_ADDRESS_TBL ???

The kernel address table is a list of memory mappings, which the Cafe OS kernel feeds into its MMU at process startup. It's a bunch of 0x10-byte entries, a 4-tuple of (virtual address, range length, physical address, flags). By doing the math, you'll see that the kernel exploit is modifying entry 4 of the address table. This is a mapping of 0xA0000000, which spans 0x40000000 bytes. Originally, it looks like:

Code:

RAM:FFEAAA50                 .long 0xA0000000
RAM:FFEAAA54                 .long 0x40000000
RAM:FFEAAA58                 .long 0
RAM:FFEAAA5C                 .long 0x2000

As it is, accessing that range would crash. Those kern_write() calls set the physical address to 0x31000000, and the flags to 0x28305800, which are the same as the 0x10000000 area's. So now we have 0xA0000000 mapping to 0x31000000 physical as RW memory. Since the loader and system libraries start at 0x32000000, and 0x01000000 maps to 0x32000000, you now have a writable version of 0x01000000 mirrored at 0xA1000000.

golden45 said:
About the syscall table, how did you find that the right one (for 5.3.2) was at 0xFFEAA0E0 (and not the other ones at FFE84C70, FFE85070, FFE85470, FFEA9CE0) ?
I see that the pointer to this table is set in 0x64(FFEx4000) by the function at 0xFFF0248 (more exactly at FFF024D4) but I don't really understand the differences between each table and their usage.

The first step in porting the exploit to 5.3.2 was to get the race attack to succeed. Once the race attack worked, and we could write arbitrary kernel memory, we just tried all of the syscall table candidates until kern_read()/kern_write() worked.

As for why there are multiple tables, I know that there's one for main apps and one for the loader, but I don't know what the others are for.

golden45 · Aug 20, 2015

Thank you very much for those explanations, everything makes perfect sense now. It will help fill my ida project =)

EclipseSin · Aug 21, 2015

Awesome post. Only read a bit so far, but from what I have, it will be a big help and hopefully will answer a lot of questions I've been seeing around. Thanks NWP, MN1, and crew for all your help, and MN1 for this writeup!!

if I forgot you, my bad. I don't know who all is helping at the moment on this stuff.

thexyz · Aug 22, 2015

Interesting, thanks for the writeup. What's the reason you used ROP for created threads thread0, thread2 instead of making functions for these?

Marionumber1 · Aug 22, 2015

thexyz said:
Interesting, thanks for the writeup. What's the reason you used ROP for created threads thread0, thread2 instead of making functions for these?

CPU0 and CPU2 don't have the JIT area mapped, so we can't just run code in our binary.

VinsCool · Aug 22, 2015

I dare to ask. Is it improvable in any ways? For stability and reliability?

Very interesting post though

Marionumber1 · Aug 22, 2015

VinsCool said:
I dare to ask. Is it improvable in any ways? For stability and reliability?

Very interesting post though

It is possible to improve it in various ways. Matt had some timing fixes which he hasn't yet put in libwiiu (I should ask him to), which will make things better. I did spend a fair amount of June and July trying to make it better, and eventually found why it was so unpredictable: the cache misses would slow things down a lot, and it was hard to control them all. That was about the point at which I started looking for a new kernel exploit, and now I have one.

There is another potential option, which I intended to look into, but comex had dissuaded me from. If we bombarded CPU1 with ICIs (inter-core interrupts), we may be able to slow it down by a significant amount, enough that cache misses wouldn't matter. However, comex told me that ICIs would be treated as IRQs, which are disabled in kernel mode. I trusted him at that point, but I recently started looking into the IRQ system and found no mention of ICIs. Then I found the likely ICI vector...

VinsCool · Aug 22, 2015

Marionumber1 said:
It is possible to improve it in various ways. Matt had some timing fixes which he hasn't yet put in libwiiu (I should ask him to), which will make things better. I did spend a fair amount of June and July trying to make it better, and eventually found why it was so unpredictable: the cache misses would slow things down a lot, and it was hard to control them all. That was about the point at which I started looking for a new kernel exploit, and now I have one.

There is another potential option, which I intended to look into, but comex had dissuaded me from. If we bombarded CPU1 with ICIs (inter-core interrupts), we may be able to slow it down by a significant amount, enough that cache misses wouldn't matter. However, comex told me that ICIs would be treated as IRQs, which are disabled in kernel mode. I trusted him at that point, but I recently started looking into the IRQ system and found no mention of ICIs. Then I found the likely ICI vector...

Verrry interesting! I wish you good luck at it

gudenau · Aug 22, 2015

Is this for the latest FIRM?

Marionumber1 · Aug 22, 2015

gudenaurock said:
Is this for the latest FIRM?

This was patched on 5.5.0, that's the main reason I released it.

I have a completely different kernel exploit for the latest 5.5.0, which we're still putting the finishes touches on.

gudenau · Aug 22, 2015

Marionumber1 said:
This was patched on 5.5.0, that's the main reason I released it. I have a completely different kernel exploit for the latest 5.5.0, which we're still putting the finishes touches on.

Nice, good luck! (You have userland I asume?)

Marionumber1 · Aug 22, 2015

gudenaurock said:
Nice, good luck! (You have userland I asume?)

Yep, we got user mode code execution a few days ago.

Rusb · Aug 22, 2015

Marionumber1 said:
Yep, we got user mode code execution a few days ago.

You used a 3DS for it?

golden45 · Aug 22, 2015

For the kernel address table, did you find out what are the flag bits?

Marionumber1 · Aug 22, 2015

Rusb said:
You used a 3DS for it?

This is the Wii U section, and I assume gudenaurock saying "FIRM" was a mistake.

--------------------- MERGED ---------------------------

golden45 said:
For the kernel address table, did you find out what are the flag bits?

I've seen a few of them used in code, but haven't really figured out most of them. I've assumed it's the same as the BAT flags, which should be in the PowerPC manual.

Rusb · Aug 22, 2015

Marionumber1 said:
This is the Wii U section, and I assume gudenaurock saying "FIRM" was a mistake.

--------------------- MERGED ---------------------------

I've seen a few of them used in code, but haven't really figured out most of them. I've assumed it's the same as the BAT flags, which should be in the PowerPC manual.

Y supose then no, I thought that user execution can be achieved in WiiU sending modified data from 3DS by a program that communicates with WiiU, like in Smash Bros, for that I asked.

Marionumber1 · Aug 22, 2015

Rusb said:
Y supose then no, I thought that user execution can be achieved in WiiU sending modified data from 3DS by a program that communicates with WiiU, like in Smash Bros, for that I asked.

No, I'm not aware of anyone doing that, but it may very well be possible. This was an exploit in the web browser.

gudenau · Aug 22, 2015

Marionumber1 said:
This is the Wii U section, and I assume gudenaurock saying "FIRM" was a mistake.

--------------------- MERGED ---------------------------

I've seen a few of them used in code, but haven't really figured out most of them. I've assumed it's the same as the BAT flags, which should be in the PowerPC manual.

Rusb said:
Y supose then no, I thought that user execution can be achieved in WiiU sending modified data from 3DS by a program that communicates with WiiU, like in Smash Bros, for that I asked.

Sorry, I have been doing a lot more 3DS stuff. I did not intend to cause confusion. A smash attack with a 3DS would be cool though, very unlikely though.

mariogamer · Aug 23, 2015

Marionumber1 said:
This was patched on 5.5.0, that's the main reason I released it. I have a completely different kernel exploit for the latest 5.5.0, which we're still putting the finishes touches on.

can I update if I'm in 5.4.0?

golden45 · Aug 30, 2015

golden45 said:
About the syscall table, how did you find that the right one (for 5.3.2) was at 0xFFEAA0E0 (and not the other ones at FFE84C70, FFE85070, FFE85470, FFEA9CE0) ?
I see that the pointer to this table is set in 0x64(FFEx4000) by the function at 0xFFF0248 (more exactly at FFF024D4) but I don't really understand the differences between each table and their usage.

Marionumber1 said:
The first step in porting the exploit to 5.3.2 was to get the race attack to succeed. Once the race attack worked, and we could write arbitrary kernel memory, we just tried all of the syscall table candidates until kern_read()/kern_write() worked. As for why there are multiple tables, I know that there's one for main apps and one for the loader, but I don't know what the others are for.

I was not able to use the read/write syscalls while executing a game (via pygecko).
It turns out that the syscall table modified by the kernel exploit is not the one used by games (at least on 5.3.2).

So I added the read/write syscalls into the other syscall tables, and it worked.
On 5.3.2 the syscall table used by games is the one located at 0xFFE85070.

Hacking OSDriver kernel exploit - a technical description

Well-Known Member

Well-Known Member

Ignorant Wizard

Active Member

Well-Known Member

Silly Catgirl

Well-Known Member

Silly Catgirl

Largely ignored

Well-Known Member

Largely ignored

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Largely ignored

Well-Known Member

Well-Known Member

Similar threads

Popular threads in this forum