I'm quite surprised that I was unable to find some relevant info on the internet about this. I know it sounds crazy, but at the same time I think it's an interesting subject to analyze and discuss.
Also, I was absurdly curious about this for some time and actually came up with a possible solution to position independent code (PIC) on the DS. I would like to share my thoughts about this so that a future someone who thinks about this could read some specific information on this topic.
My idea consists of loading the contents of a precompiled binary from a filesystem and calling it at through a C function pointer. The moment you no longer need that piece of code, you can just free the previously loaded data memory. I'll explain a minimum working example below.
Consider the following sequence of code that prints a message to the console screen:
main.pic.c
Let's convert this code into a file "main.pic.bin" binary file that contains the position independent instructions and data segments.
To do this, we use the gcc and objcopy tools (I did this in batch script on windows, it's easily adaptable to linux too):
Step 0: env setup
I'm using the full paths to disambiguate the exact tools I've used (I'm not an expert so I feel the need to write it to the last detail).
Step 1: compile
Note the -fpic flag which enables position independent code generation. The -ffreestanding -nostdlib -nostartfiles flags get rid of standard library functions and creates a "unbloated" build.
In order for this idea to work, the linker has to ensure that the generated build "main.pic.bin" starts with the very first instruction we want to execute. Therefore, we force the code segment (.text) to be placed before anything else. Moreover, we want a specific part of the code to be placed first (the one marked with __attribute__((section(".text.entrypoint")))). Along with the code we include the constants (.rodata), initialized (.data) and unitialized (.bss) memory segments. The DS is permissive with putting everything in the EWRAM (the sole reason we _do_ need a dynamic code loader ).
The linker script that achieves this is listed below:
pic_linker.ld
We can visualize the contents of the object file main.pic.o to see that it did what we wanted:
We observe that the _start symbol is placed at offset 0 (which is good) and code and data segments are adjacent to each other.
Step 2: Obtaining PIC "executable"
This command creates the position independent binary executable code and places it in the nitro/ folder of the DS project. We can visualize this file to check it generated correctly:
Note the first two bytes (0x10 0xb5) is the little endian encoding of the first instruction of the _start function: 0xb510 (push {r4, lr}). Also, the character sequence "Hello world from PIC!\0" follows immediately after the code.
Step 3. Time to test it
Let's create a simple dynamic code loader in the DS main.cpp file.
And finally put it to work:
Limitations and improvement opportunities
Main benefit: Overcoming the 4MB limit of the DS EWRAM where code is shared with constants and the actual work memory. I am aware that code is not such an expensive resource, however, in the extreme case your homebrew has lots of small functionalities which are used like one at a time, a great portion of the code could be redundant at every moment of execution.
You can find the project and code for this experiment here (this is where I'll potentially update it): https://github.com/NotImplementedLife/hello_world_pic
Also, I was absurdly curious about this for some time and actually came up with a possible solution to position independent code (PIC) on the DS. I would like to share my thoughts about this so that a future someone who thinks about this could read some specific information on this topic.
My idea consists of loading the contents of a precompiled binary from a filesystem and calling it at through a C function pointer. The moment you no longer need that piece of code, you can just free the previously loaded data memory. I'll explain a minimum working example below.
Consider the following sequence of code that prints a message to the console screen:
main.pic.c
C:
// Note that in this position indepent code we don't have
// access to functions that exist at runtime (like printf).
// Because of that, I hardcoded the process of printing
// characters to the default console for the purpose of
// this demo.
#define BG0_SUB ((short*)0x0620B000)
void print_chars(const char* message)
{
short* map_ptr = BG0_SUB;
while(*message)
*map_ptr++ = 0xF000|*message++;
// Each character having its ascii code 0xXY will be
// converted to tile data 0xF0XY.
// 0xF is the color white on the default console's palette.
// there is a bijective corespondence between the ascii code 0xXY and
// the glyph of the character existing as tile id 0xXY.
}
// this will be the function that is actually called
__attribute__((section(".text.entrypoint")))
void _start()
{
print_chars("Hello world from PIC!");
}
Let's convert this code into a file "main.pic.bin" binary file that contains the position independent instructions and data segments.
To do this, we use the gcc and objcopy tools (I did this in batch script on windows, it's easily adaptable to linux too):
Step 0: env setup
I'm using the full paths to disambiguate the exact tools I've used (I'm not an expert so I feel the need to write it to the last detail).
Code:
:: your devkitARM path here
@set devkitarm=D:\Software\Developer\devkitPro\devkitARM
@set gcc=%devkitarm%\bin\arm-none-eabi-gcc.exe
@set objcopy=%devkitarm%\arm-none-eabi\bin\objcopy.exe
@set objdump=%devkitarm%\arm-none-eabi\bin\objdump.exe
Step 1: compile
Code:
@set flags=-mthumb -march=armv5te -mtune=arm946e-s -fpic -ffreestanding -nostdlib -nostartfiles -O2
@set linker_flags=-Wl,--section-start=.text=0x00000000,-Tpic_linker.ld
%gcc% main.pic.c %flags% %linker_flags% -o main.pic.o
Note the -fpic flag which enables position independent code generation. The -ffreestanding -nostdlib -nostartfiles flags get rid of standard library functions and creates a "unbloated" build.
In order for this idea to work, the linker has to ensure that the generated build "main.pic.bin" starts with the very first instruction we want to execute. Therefore, we force the code segment (.text) to be placed before anything else. Moreover, we want a specific part of the code to be placed first (the one marked with __attribute__((section(".text.entrypoint")))). Along with the code we include the constants (.rodata), initialized (.data) and unitialized (.bss) memory segments. The DS is permissive with putting everything in the EWRAM (the sole reason we _do_ need a dynamic code loader ).
The linker script that achieves this is listed below:
pic_linker.ld
Code:
SECTIONS
{
. = 0x0000;
.text : { *(.text.entrypoint) *(.text*) }
.rodata : { *(.rodata) }
.data : { *(.data) }
.bss : { *(.bss) }
}
We can visualize the contents of the object file main.pic.o to see that it did what we wanted:
Code:
cmd> %objdump% -D main.o
main.pic.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: b510 push {r4, lr}
2: 4802 ldr r0, [pc, #8] ; (c <_start+0xc>)
4: 4478 add r0, pc
6: f000 f803 bl 10 <print_chars>
a: bd10 pop {r4, pc}
c: 0000002c andeq r0, r0, ip, lsr #32
00000010 <print_chars>:
10: 7803 ldrb r3, [r0, #0]
12: b510 push {r4, lr}
14: 2b00 cmp r3, #0
16: d008 beq.n 2a <print_chars+0x1a>
18: 4904 ldr r1, [pc, #16] ; (2c <print_chars+0x1c>)
1a: 4c05 ldr r4, [pc, #20] ; (30 <print_chars+0x20>)
1c: 4323 orrs r3, r4
1e: 800b strh r3, [r1, #0]
20: 7843 ldrb r3, [r0, #1]
22: 3001 adds r0, #1
24: 3102 adds r1, #2
26: 2b00 cmp r3, #0
28: d1f8 bne.n 1c <print_chars+0xc>
2a: bd10 pop {r4, pc}
2c: 0620b000 strteq fp, [r0], -r0
30: fffff000 ; <UNDEFINED> instruction: 0xfffff000
Disassembly of section .rodata.str1.4:
00000034 <.rodata.str1.4>:
34: 6c6c6548 cfstr64vs mvdx6, [ip], #-288 ; 0xfffffee0
38: 6f77206f svcvs 0x0077206f
3c: 20646c72 rsbcs r6, r4, r2, ror ip
40: 6d6f7266 sfmvs f7, 2, [pc, #-408]! ; fffffeb0 <print_chars+0xfffffea0>
44: 43495020 movtmi r5, #36896 ; 0x9020
48: Address 0x0000000000000048 is out of bounds.
[... skipped sections .ARM.attributes and .comment]
We observe that the _start symbol is placed at offset 0 (which is good) and code and data segments are adjacent to each other.
Step 2: Obtaining PIC "executable"
Code:
cmd > %objcopy% -O binary main.pic.o ../nitro/main.pic.bin
This command creates the position independent binary executable code and places it in the nitro/ folder of the DS project. We can visualize this file to check it generated correctly:
Code:
cmd > hexdump -C ../nitro/main.pic.bin
00000000 10 b5 02 48 78 44 00 f0 03 f8 10 bd 2c 00 00 00 |...HxD......,...|
00000010 03 78 10 b5 00 2b 08 d0 04 49 05 4c 23 43 0b 80 |.x...+...I.L#C..|
00000020 43 78 01 30 02 31 00 2b f8 d1 10 bd 00 b0 20 06 |Cx.0.1.+...... .|
00000030 00 f0 ff ff 48 65 6c 6c 6f 20 77 6f 72 6c 64 20 |....Hello world |
00000040 66 72 6f 6d 20 50 49 43 21 00 |from PIC!.|
Note the first two bytes (0x10 0xb5) is the little endian encoding of the first instruction of the _start function: 0xb510 (push {r4, lr}). Also, the character sequence "Hello world from PIC!\0" follows immediately after the code.
Step 3. Time to test it
Let's create a simple dynamic code loader in the DS main.cpp file.
C:
#include <nds.h>
#include <stdio.h>
#include <fat.h>
#include <filesystem.h>
// we call our position independent code via
// this function pointer
typedef void(*pic_function)();
// create a buffer that will hold the instructions
// loaded from file
void* load_position_independent_code(const char* filename)
{
FILE* f = fopen(filename, "rb");
// load all contents of the file into memory
fseek(f, 0 , SEEK_END);
int fileSize = ftell(f);
fseek(f, 0 , SEEK_SET);
char* binary = (char*)malloc(fileSize);
fread(binary, fileSize, 1, f);
fclose(f);
return binary;
}
And finally put it to work:
C:
int main(void) {
consoleDemoInit();
nitroFSInit(NULL);
// load the compiled position-independent
// binary main.pic.bin from nitroFS
void* my_external_code
= load_position_independent_code("nitro:/main.pic.bin");
// the start offset of my_external_code is the address
// of our position independent function.
// We get its address and cast it to pic_function
// callable function pointer.
// Note the "+ 1" added to the offset. This tells
// the cpu to call this function in
// THUMB mode (because main.pic.c was
// compiled as thumb).
pic_function my_external_function
= (pic_function)((int)my_external_code + 1);
// call the function. It should print "Hello world from PIC!" on the bottom screen.
my_external_function();
// idle
while(1) {swiWaitForVBlank();}
return 0;
}
Limitations and improvement opportunities
- The PIC has literally no connection with the rest of the system (except for hardcoding). There may be a way to make the PIC hold some data or function pointers to "the outside world" which are inferred at load time (e.g. malloc). The compiler should be able to output a symbols map that we can use for that.
- There is no communication with other currently loaded PIC modules. That means, there is currently no way to have shared-like dependencies (e.g. code for sprite position rendering that requires a math PIC module that may also be used by other PIC packages).
- The tradeoff for dynamic code loading is having quite around >100KB permanently residing code that handles the loader logic (this includes fopen, malloc, nitro/fat (yes, it works with SD cards too) etc.).
- Loading times and practical usages are still unknown. I plan to toy around with this method for the time being.
Main benefit: Overcoming the 4MB limit of the DS EWRAM where code is shared with constants and the actual work memory. I am aware that code is not such an expensive resource, however, in the extreme case your homebrew has lots of small functionalities which are used like one at a time, a great portion of the code could be redundant at every moment of execution.
You can find the project and code for this experiment here (this is where I'll potentially update it): https://github.com/NotImplementedLife/hello_world_pic