Hacking OSDriver kernel exploit - a technical description

Marionumber1

Well-Known Member
OP
Member
Joined
Nov 7, 2010
Messages
1,234
Trophies
3
XP
4,045
Country
United States
As many of you have seen, I released the first kernel exploit developed by our team a few days ago. Quite a few people are just using it happily, but others have wanted to know how it works, so I decided to put together a few of these questions into a thread. I've mentioned that the kernel bug is in the OSDriver_CopyToSaveArea() function, but it's more complex than that.

How does the exploit work?
In Cafe OS, the device drivers for peripherals like the GPU, audio interface, DSP, and display controller are part of Cafe OS userspace libraries (like gx2.rpl, snd_core.rpl, dc.rpl). These drivers often need to store data that persists between processes, so the kernel has to provide cross-process memory areas for drivers. For this reason, the Cafe OS kernel lets you register an entity called an OSDriver, which has a cross-process memory buffer (called a save area).

There are four functions you use to manipulate OSDriver structures. OSDriver_Register() allocates an OSDriver structure and adds it to a linked list of them. OSDriver_Deregister() frees a driver's save area and the driver structure, removing it from the list. OSDriver_CopyToSaveArea() copies data from userspace to an OSDriver's save area, allocating a save area if it doesn't exist. OSDriver_CopyFromSaveArea() copies data from a driver's save area to userspace.

Since all 3 PPC cores may use this API at once, locking is needed to make sure only one core modifies the data at a time. The bug is that the spinlock protecting the OSDriver list (driver_ctxt_lock) is dropped before the actual copy from userspace starts. Once you actually reach the copy_in() call, no locks are held, allowing the driver to get deleted while the copy is taking place. Then whatever gets allocated on the heap in its place may get overwritten by the copy in progress. Doing this attack is nicer on the Wii U, since it has three cores.

Given how all of the OSDriver functions work, we can construct an attack as follows.

Let's say we allocate two drivers, DRVA and DRVB. Then the heap will look like this:
[DRVA - 0x4c] [DRVB - 0x4c] [ Unallocated - 0x1000 ]

Now we want to give DRVB a save area. It doesn't have one, but CopyToSaveArea() will allocate one if we copy in data of any size. Let's just copy in 4 bytes. Now the heap layout is:
[DRVA - 0x4c] [DRVB - 0x4c] [DRVB save - 0x1000]

Time to start copying in our payload, a full 0x1000 bytes. This will go into DRVB save. Meanwhile, on another CPU, start freeing DRVB. When the free completes, the heap looks like this:
[DRVA - 0x4c] [ Unallocated - 0x104c ]

But the copy is still going on inside unallocated space. Now we have to put something there. Let's give driver A a save area by copying in 4 to it. The heap will then be:
[DRVA - 0x4c] [DRVA save - 0x1000] [ Unallocated - 0x4c ]

If we get the timing right, the copy will still be inside DRVA save at this point. Now we allocate a third driver, DRVC. It will make the heap look like:
[DRVA - 0x4c] [DRVA save - 0x1000] [DRVC - 0x4c]

And with any luck, the copy will reach DRVC after it's allocated. We can overwrite the contents of the kernel's OSDriver structure, which includes the save area pointer. This means that we can set the save area of DRVC to a kernel address, and then copy kernel data in and out of userspace. We decided to pick the syscall table, and install the kern_read()/kern_write() calls.

What can kern_read() and kern_write() do?
kern_read() and kern_write() force the kernel to read from and set the values at 32-bit pointers. They let you bypass kernel-only page protections, and write kernel data, but if you try to break other protections (like read-only) if will crash. That brings us to the next question.

@Marionumber1 @NWPlayer123
#define KERN_ADDRESS_TBL ???
The kernel address table is a list of memory mappings, which the Cafe OS kernel feeds into its MMU at process startup. It's a bunch of 0x10-byte entries, a 4-tuple of (virtual address, range length, physical address, flags). By doing the math, you'll see that the kernel exploit is modifying entry 4 of the address table. This is a mapping of 0xA0000000, which spans 0x40000000 bytes. Originally, it looks like:
Code:
RAM:FFEAAA50                 .long 0xA0000000
RAM:FFEAAA54                 .long 0x40000000
RAM:FFEAAA58                 .long 0
RAM:FFEAAA5C                 .long 0x2000

As it is, accessing that range would crash. Those kern_write() calls set the physical address to 0x31000000, and the flags to 0x28305800, which are the same as the 0x10000000 area's. So now we have 0xA0000000 mapping to 0x31000000 physical as RW memory. Since the loader and system libraries start at 0x32000000, and 0x01000000 maps to 0x32000000, you now have a writable version of 0x01000000 mirrored at 0xA1000000.

About the syscall table, how did you find that the right one (for 5.3.2) was at 0xFFEAA0E0 (and not the other ones at FFE84C70, FFE85070, FFE85470, FFEA9CE0) ?
I see that the pointer to this table is set in 0x64(FFEx4000) by the function at 0xFFF0248 (more exactly at FFF024D4) but I don't really understand the differences between each table and their usage.
The first step in porting the exploit to 5.3.2 was to get the race attack to succeed. Once the race attack worked, and we could write arbitrary kernel memory, we just tried all of the syscall table candidates until kern_read()/kern_write() worked. :P As for why there are multiple tables, I know that there's one for main apps and one for the loader, but I don't know what the others are for.
 
Last edited by Marionumber1,

EclipseSin

Ignorant Wizard
Member
Joined
Apr 1, 2015
Messages
2,063
Trophies
1
Age
35
Location
221b Baker Street
XP
1,737
Country
United Kingdom
Awesome post. Only read a bit so far, but from what I have, it will be a big help and hopefully will answer a lot of questions I've been seeing around. Thanks NWP, MN1, and crew for all your help, and MN1 for this writeup!! :) if I forgot you, my bad. I don't know who all is helping at the moment on this stuff.
 

thexyz

Active Member
Newcomer
Joined
Jan 8, 2014
Messages
40
Trophies
0
Age
54
XP
180
Country
Serbia, Republic of
Interesting, thanks for the writeup. What's the reason you used ROP for created threads thread0, thread2 instead of making functions for these?
 

Marionumber1

Well-Known Member
OP
Member
Joined
Nov 7, 2010
Messages
1,234
Trophies
3
XP
4,045
Country
United States
I dare to ask. Is it improvable in any ways? For stability and reliability?

Very interesting post though :)

It is possible to improve it in various ways. Matt had some timing fixes which he hasn't yet put in libwiiu (I should ask him to), which will make things better. I did spend a fair amount of June and July trying to make it better, and eventually found why it was so unpredictable: the cache misses would slow things down a lot, and it was hard to control them all. That was about the point at which I started looking for a new kernel exploit, and now I have one.

There is another potential option, which I intended to look into, but comex had dissuaded me from. If we bombarded CPU1 with ICIs (inter-core interrupts), we may be able to slow it down by a significant amount, enough that cache misses wouldn't matter. However, comex told me that ICIs would be treated as IRQs, which are disabled in kernel mode. I trusted him at that point, but I recently started looking into the IRQ system and found no mention of ICIs. Then I found the likely ICI vector...
 

VinsCool

Persona Secretiva Felineus
Global Moderator
Joined
Jan 7, 2014
Messages
14,600
Trophies
4
Location
Another World
Website
www.gbatemp.net
XP
25,207
Country
Canada
It is possible to improve it in various ways. Matt had some timing fixes which he hasn't yet put in libwiiu (I should ask him to), which will make things better. I did spend a fair amount of June and July trying to make it better, and eventually found why it was so unpredictable: the cache misses would slow things down a lot, and it was hard to control them all. That was about the point at which I started looking for a new kernel exploit, and now I have one.

There is another potential option, which I intended to look into, but comex had dissuaded me from. If we bombarded CPU1 with ICIs (inter-core interrupts), we may be able to slow it down by a significant amount, enough that cache misses wouldn't matter. However, comex told me that ICIs would be treated as IRQs, which are disabled in kernel mode. I trusted him at that point, but I recently started looking into the IRQ system and found no mention of ICIs. Then I found the likely ICI vector...
Verrry interesting! I wish you good luck at it :P
 
  • Like
Reactions: Margen67

Marionumber1

Well-Known Member
OP
Member
Joined
Nov 7, 2010
Messages
1,234
Trophies
3
XP
4,045
Country
United States
You used a 3DS for it?

This is the Wii U section, and I assume gudenaurock saying "FIRM" was a mistake.

--------------------- MERGED ---------------------------

For the kernel address table, did you find out what are the flag bits?

I've seen a few of them used in code, but haven't really figured out most of them. I've assumed it's the same as the BAT flags, which should be in the PowerPC manual.
 

Rusb

Well-Known Member
Member
Joined
Apr 17, 2014
Messages
178
Trophies
0
XP
958
Country
This is the Wii U section, and I assume gudenaurock saying "FIRM" was a mistake.

--------------------- MERGED ---------------------------



I've seen a few of them used in code, but haven't really figured out most of them. I've assumed it's the same as the BAT flags, which should be in the PowerPC manual.
Y supose then no, I thought that user execution can be achieved in WiiU sending modified data from 3DS by a program that communicates with WiiU, like in Smash Bros, for that I asked.
 

Marionumber1

Well-Known Member
OP
Member
Joined
Nov 7, 2010
Messages
1,234
Trophies
3
XP
4,045
Country
United States
Y supose then no, I thought that user execution can be achieved in WiiU sending modified data from 3DS by a program that communicates with WiiU, like in Smash Bros, for that I asked.

No, I'm not aware of anyone doing that, but it may very well be possible. This was an exploit in the web browser.
 
  • Like
Reactions: Margen67

gudenau

Largely ignored
Member
Joined
Jul 7, 2010
Messages
3,882
Trophies
2
Location
/dev/random
Website
www.gudenau.net
XP
5,379
Country
United States
This is the Wii U section, and I assume gudenaurock saying "FIRM" was a mistake.

--------------------- MERGED ---------------------------



I've seen a few of them used in code, but haven't really figured out most of them. I've assumed it's the same as the BAT flags, which should be in the PowerPC manual.

Y supose then no, I thought that user execution can be achieved in WiiU sending modified data from 3DS by a program that communicates with WiiU, like in Smash Bros, for that I asked.

Sorry, I have been doing a lot more 3DS stuff. I did not intend to cause confusion. A smash attack with a 3DS would be cool though, very unlikely though.
 

golden45

Well-Known Member
Member
Joined
Jun 23, 2015
Messages
108
Trophies
0
Age
124
XP
473
Country
France
About the syscall table, how did you find that the right one (for 5.3.2) was at 0xFFEAA0E0 (and not the other ones at FFE84C70, FFE85070, FFE85470, FFEA9CE0) ?
I see that the pointer to this table is set in 0x64(FFEx4000) by the function at 0xFFF0248 (more exactly at FFF024D4) but I don't really understand the differences between each table and their usage.

The first step in porting the exploit to 5.3.2 was to get the race attack to succeed. Once the race attack worked, and we could write arbitrary kernel memory, we just tried all of the syscall table candidates until kern_read()/kern_write() worked. :P As for why there are multiple tables, I know that there's one for main apps and one for the loader, but I don't know what the others are for.

I was not able to use the read/write syscalls while executing a game (via pygecko).
It turns out that the syscall table modified by the kernel exploit is not the one used by games (at least on 5.3.2).

So I added the read/write syscalls into the other syscall tables, and it worked.
On 5.3.2 the syscall table used by games is the one located at 0xFFE85070.
 

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
    AncientBoi @ AncientBoi: 9:02 am here