On Position Independent Code potential to minimize unused code in EWRAM

NotImpLife

Active Member
OP
Newcomer
Joined
Mar 9, 2021
Messages
44
Trophies
0
Website
github.com
XP
520
Country
Romania
I'm quite surprised that I was unable to find some relevant info on the internet about this. I know it sounds crazy, but at the same time I think it's an interesting subject to analyze and discuss.
Also, I was absurdly curious about this for some time and actually came up with a possible solution to position independent code (PIC) on the DS. I would like to share my thoughts about this so that a future someone who thinks about this could read some specific information on this topic.

My idea consists of loading the contents of a precompiled binary from a filesystem and calling it at through a C function pointer. The moment you no longer need that piece of code, you can just free the previously loaded data memory. I'll explain a minimum working example below.

Consider the following sequence of code that prints a message to the console screen:


main.pic.c
C:
// Note that in this position indepent code we don't have
// access to functions that exist at runtime (like printf).
// Because of that, I hardcoded the process of printing
// characters to the default console for the purpose of
// this demo.

#define BG0_SUB ((short*)0x0620B000)
void print_chars(const char* message)
{
    short* map_ptr = BG0_SUB;
    while(*message)
        *map_ptr++ = 0xF000|*message++;
    // Each character having its ascii code 0xXY will be
    // converted to tile data 0xF0XY.
    // 0xF is the color white on the default console's palette.
    // there is a bijective corespondence between the ascii code 0xXY and
    // the glyph of the character existing as tile id 0xXY.
}

// this will be the function that is actually called
__attribute__((section(".text.entrypoint")))
void _start()
{
    print_chars("Hello world from PIC!");
}

Let's convert this code into a file "main.pic.bin" binary file that contains the position independent instructions and data segments.

To do this, we use the gcc and objcopy tools (I did this in batch script on windows, it's easily adaptable to linux too):

Step 0: env setup
I'm using the full paths to disambiguate the exact tools I've used (I'm not an expert so I feel the need to write it to the last detail).
Code:
:: your devkitARM path here
@set devkitarm=D:\Software\Developer\devkitPro\devkitARM
@set gcc=%devkitarm%\bin\arm-none-eabi-gcc.exe
@set objcopy=%devkitarm%\arm-none-eabi\bin\objcopy.exe
@set objdump=%devkitarm%\arm-none-eabi\bin\objdump.exe

Step 1: compile
Code:
@set flags=-mthumb -march=armv5te -mtune=arm946e-s -fpic -ffreestanding -nostdlib -nostartfiles -O2
@set linker_flags=-Wl,--section-start=.text=0x00000000,-Tpic_linker.ld
%gcc% main.pic.c %flags% %linker_flags% -o main.pic.o

Note the -fpic flag which enables position independent code generation. The -ffreestanding -nostdlib -nostartfiles flags get rid of standard library functions and creates a "unbloated" build.

In order for this idea to work, the linker has to ensure that the generated build "main.pic.bin" starts with the very first instruction we want to execute. Therefore, we force the code segment (.text) to be placed before anything else. Moreover, we want a specific part of the code to be placed first (the one marked with __attribute__((section(".text.entrypoint")))). Along with the code we include the constants (.rodata), initialized (.data) and unitialized (.bss) memory segments. The DS is permissive with putting everything in the EWRAM (the sole reason we _do_ need a dynamic code loader 😁).

The linker script that achieves this is listed below:

pic_linker.ld
Code:
SECTIONS
{
  . = 0x0000;
  .text : { *(.text.entrypoint) *(.text*) }  
  .rodata : { *(.rodata) }
  .data : { *(.data) }
  .bss : { *(.bss) }
}

We can visualize the contents of the object file main.pic.o to see that it did what we wanted:

Code:
cmd> %objdump% -D main.o

main.pic.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <_start>:
   0:   b510            push    {r4, lr}
   2:   4802            ldr     r0, [pc, #8]    ; (c <_start+0xc>)
   4:   4478            add     r0, pc
   6:   f000 f803       bl      10 <print_chars>
   a:   bd10            pop     {r4, pc}
   c:   0000002c        andeq   r0, r0, ip, lsr #32

00000010 <print_chars>:
  10:   7803            ldrb    r3, [r0, #0]
  12:   b510            push    {r4, lr}
  14:   2b00            cmp     r3, #0
  16:   d008            beq.n   2a <print_chars+0x1a>
  18:   4904            ldr     r1, [pc, #16]   ; (2c <print_chars+0x1c>)
  1a:   4c05            ldr     r4, [pc, #20]   ; (30 <print_chars+0x20>)
  1c:   4323            orrs    r3, r4
  1e:   800b            strh    r3, [r1, #0]
  20:   7843            ldrb    r3, [r0, #1]
  22:   3001            adds    r0, #1
  24:   3102            adds    r1, #2
  26:   2b00            cmp     r3, #0
  28:   d1f8            bne.n   1c <print_chars+0xc>
  2a:   bd10            pop     {r4, pc}
  2c:   0620b000        strteq  fp, [r0], -r0
  30:   fffff000                        ; <UNDEFINED> instruction: 0xfffff000
  
  Disassembly of section .rodata.str1.4:

00000034 <.rodata.str1.4>:
  34:   6c6c6548        cfstr64vs       mvdx6, [ip], #-288      ; 0xfffffee0
  38:   6f77206f        svcvs   0x0077206f
  3c:   20646c72        rsbcs   r6, r4, r2, ror ip
  40:   6d6f7266        sfmvs   f7, 2, [pc, #-408]!     ; fffffeb0 <print_chars+0xfffffea0>
  44:   43495020        movtmi  r5, #36896      ; 0x9020
  48:   Address 0x0000000000000048 is out of bounds.
  
 [... skipped sections .ARM.attributes and .comment]

We observe that the _start symbol is placed at offset 0 (which is good) and code and data segments are adjacent to each other.

Step 2: Obtaining PIC "executable"

Code:
cmd > %objcopy% -O binary main.pic.o ../nitro/main.pic.bin

This command creates the position independent binary executable code and places it in the nitro/ folder of the DS project. We can visualize this file to check it generated correctly:

Code:
cmd > hexdump -C ../nitro/main.pic.bin

00000000  10 b5 02 48 78 44 00 f0  03 f8 10 bd 2c 00 00 00  |...HxD......,...|
00000010  03 78 10 b5 00 2b 08 d0  04 49 05 4c 23 43 0b 80  |.x...+...I.L#C..|
00000020  43 78 01 30 02 31 00 2b  f8 d1 10 bd 00 b0 20 06  |Cx.0.1.+...... .|
00000030  00 f0 ff ff 48 65 6c 6c  6f 20 77 6f 72 6c 64 20  |....Hello world |
00000040  66 72 6f 6d 20 50 49 43  21 00                    |from PIC!.|

Note the first two bytes (0x10 0xb5) is the little endian encoding of the first instruction of the _start function: 0xb510 (push {r4, lr}). Also, the character sequence "Hello world from PIC!\0" follows immediately after the code.

Step 3. Time to test it

Let's create a simple dynamic code loader in the DS main.cpp file.

C:
#include <nds.h>
#include <stdio.h>
#include <fat.h>
#include <filesystem.h>

// we call our position independent code via
// this function pointer
typedef void(*pic_function)();

// create a buffer that will hold the instructions 
// loaded from file
void* load_position_independent_code(const char* filename)
{
    FILE* f = fopen(filename, "rb");
    // load all contents of the file into memory
    fseek(f, 0 , SEEK_END);
    int fileSize = ftell(f);        
    fseek(f, 0 , SEEK_SET);
    char* binary = (char*)malloc(fileSize);        
    fread(binary, fileSize, 1, f);    
    fclose(f);
    return binary;
}

And finally put it to work:

C:
int main(void) {        
    consoleDemoInit();
    nitroFSInit(NULL);
    // load the compiled position-independent 
    // binary main.pic.bin from nitroFS
    void* my_external_code 
        = load_position_independent_code("nitro:/main.pic.bin");
    // the start offset of my_external_code is the address
    // of our position independent function.
    
    // We get its address and cast it to pic_function
    // callable function pointer.
    // Note the "+ 1" added to the offset. This tells 
    // the cpu to call this function in
    // THUMB mode (because main.pic.c was 
    // compiled as thumb).
    pic_function my_external_function 
        = (pic_function)((int)my_external_code + 1);    
    
    // call the function. It should print "Hello world from PIC!" on the bottom screen.
    my_external_function();         
    
    // idle
    while(1) {swiWaitForVBlank();}
    return 0;
}

1726071512788.png


Limitations and improvement opportunities

  • The PIC has literally no connection with the rest of the system (except for hardcoding). There may be a way to make the PIC hold some data or function pointers to "the outside world" which are inferred at load time (e.g. malloc). The compiler should be able to output a symbols map that we can use for that.
  • There is no communication with other currently loaded PIC modules. That means, there is currently no way to have shared-like dependencies (e.g. code for sprite position rendering that requires a math PIC module that may also be used by other PIC packages).
  • The tradeoff for dynamic code loading is having quite around >100KB permanently residing code that handles the loader logic (this includes fopen, malloc, nitro/fat (yes, it works with SD cards too) etc.).
  • Loading times and practical usages are still unknown. I plan to toy around with this method for the time being.

Main benefit: Overcoming the 4MB limit of the DS EWRAM where code is shared with constants and the actual work memory. I am aware that code is not such an expensive resource, however, in the extreme case your homebrew has lots of small functionalities which are used like one at a time, a great portion of the code could be redundant at every moment of execution.

You can find the project and code for this experiment here (this is where I'll potentially update it): https://github.com/NotImplementedLife/hello_world_pic
 

elhobbs

Well-Known Member
Member
Joined
Jul 28, 2008
Messages
1,045
Trophies
1
XP
3,113
Country
United States
  • Like
Reactions: NotImpLife

NotImpLife

Active Member
OP
Newcomer
Joined
Mar 9, 2021
Messages
44
Trophies
0
Website
github.com
XP
520
Country
Romania
You may want to take a look at how dldi solves this problem
Thank you for providing such an instructive reference!

By studying the code I was able to identify some holes in my previous thinking (I forgot to fix the global offset table after loading). I used the readelf tool to identify the key regions in my code. Now a PI module makes an exports table visible to the main program. That way, I managed to freely pass function pointers back and forth.

I also put everything into some makefile rules so that the build process would look less scary than in the first post.

Now, I guess I'll keep working on managing shared dependencies.
 

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
    S @ salazarcosplay: good morning +1