Anyone know C ? Quick question...

Cyan

GBATemp's lurking knight
Former Staff
Joined
Oct 27, 2002
Messages
23,746
Trophies
4
Age
45
Location
Engine room, learning
XP
15,563
Country
France
I'm not sure to understand what you wanted to show by placing a \0 in the center of your string.
what about the terminating character ? the 8th that you didn't initialize.
when allocating the bufffer, what if the 8th position already has a value not zero? is it set to 0 automatically when memcopying shorter string into the buffer? I've been told to always set it manually, but maybe the compiler takes care of that.
 
Last edited by Cyan,

877

Well-Known Member
OP
Member
Joined
Mar 8, 2017
Messages
178
Trophies
0
XP
407
Country
United Kingdom
@Cyan thank you very much for the details posts, it's late in the evening and it goes over my head at the moment, but I will keep reading tomorrow and try and understand as much as possible.

Appreciate everyone's comments :)
 

Ryccardo

watching Thames TV from London
Member
Joined
Feb 13, 2015
Messages
7,403
Trophies
0
Age
27
Location
Imola
XP
6,392
Country
Italy
is it set to 0 automatically when memcopying shorter string into the buffer?
Manually writing individual characters doesn't set a terminator (unless of course you write character number 0)
Mem* functions don't set a terminator (unless of course they copy a 0)
Str* functions do set a terminator (they even generally require one on inputs), but there are exceptions and variations (for example, strncpy: you set the maximum length of the destination string, if the source is smaller than the limit ALL remaining characters become terminators, if it's longer it copies until the maximum and doesn't place a terminator)

If you set both an initial string and a manual size (larger than required)... I don't know if the extra characters automatically become 0

As a rule of thumb non-global variables do not have a default value, so probably not

Yes, "lol\0" should result in 2 terminators (if they fit)

If you screw these things up too much you invent the trucha bug :P
 
Last edited by Ryccardo,

Coto

-
Member
Joined
Jun 4, 2010
Messages
2,934
Trophies
2
XP
2,408
Country
Chile
This is incorrect.

char var[] = "anything";
The compiler will insert instructions to reserve enough space (9 bytes) on the stack to hold the whole array and copy values from a possibly read only memory location, but in the end **every addressable element from "var" is guaranteed to be read/writable**.

char *var = "anything";
On the other hand will allocate space to store a single pointer to the string on the stack (4/8/other bytes depending on the target architecture) and assign an address to that string to this pointer.
Typically on a linux binary, this string will be stored on the same section as the compiled instruction, and thus be read/executable, but not writable.
actually that behaviour will depend on a mixture of : the linker setup (that is different in linux, than an embedded device with MMU unit, and without MMU unit), sections, the compiler optimize settings that must work around the Instruction Scheduling.

https://en.wikipedia.org/wiki/Instruction_scheduling

this test was done in ToolchaingenericDS environment, and disassembled so it´s exactly what the NintendoDS is running.

srctest1.c
Code:
void testfn(){
    char var[] = "anything";
    printf("test var:%s",var);
}

Code:
disassembled ARM binary:

srctest1.o:     file format elf32-littlearm

Disassembly of section .text.testfn:

00000000 <testfn>:
   0:    e92d401f     push    {r0, r1, r2, r3, r4, lr}
   4:    e3a02009     mov    r2, #9
   8:    e59f1018     ldr    r1, [pc, #24]    ; 28 <testfn+0x28>
   c:    e28d0004     add    r0, sp, #4
  10:    ebfffffe     bl    0 <memcpy>
  14:    e28d1004     add    r1, sp, #4
  18:    e59f000c     ldr    r0, [pc, #12]    ; 2c <testfn+0x2c>
  1c:    ebfffffe     bl    0 <printf>
  20:    e28dd014     add    sp, sp, #20
  24:    e49df004     pop    {pc}        ; (ldr pc, [sp], #4)
  28:    0000000c     .word    0x0000000c
  2c:    00000000     .word    0x00000000

The linker settings here are R/W/E on the .text section (MPU, not MMU). Nowadays embedded devices will just use a MMU by default, so the linker will place the .data sections in either .rodata or executable (.text), and issuing a MMU command allowing such section to execute code there.

What is done here is basically unwinding a stack context, having the source char array pointer (r1) stubbed (only the internal relocation offset is held), because the linker map will later tell the base map for the 28 <testfn+0x28> directive. Same for r0, as it will ensure to have a pointer to write the source in the stack as told. This effectively ensures these new addresses to be read/writeables (allocated and arranged by linker map, again). Then the stack is rewinded and restore context.


Code:
void testfn2(){
    char *var2 = "anything";
    printf("test var:%s",var2);
}
Code:
Disassembly of section .text.testfn2:

00000000 <testfn2>:
   0:    e59f1004     ldr    r1, [pc, #4]    ; c <testfn2+0xc>
   4:    e59f0004     ldr    r0, [pc, #4]    ; 10 <testfn2+0x10>
   8:    eafffffe     b    0 <printf>
   c:    0000000c     .word    0x0000000c
  10:    00000000     .word    0x00000000

And here is the same, except it´s just the static address. This is tricky as some linkers will discard some sections. And if the compiler fails to detect a .data section where char * var2 should go to, then it will cause an exception due to a non valid map.


If you compare both cases it relies entirely on the linker map and linker sections, regardless both of your examples.


I'm not sure to understand what you wanted to show by placing a \0 in the center of your string.
what about the terminating character ? the 8th that you didn't initialize.
when allocating the bufffer, what if the 8th position already has a value not zero? is it set to 0 automatically when memcopying shorter string into the buffer? I've been told to always set it manually, but maybe the compiler takes care of that.
If you are manually copying strings around, yes. But posix string functions (such as strcpy, sprintf, strcmp, etc) will always append a \0 delimiter.
 

Cyan

GBATemp's lurking knight
Former Staff
Joined
Oct 27, 2002
Messages
23,746
Trophies
4
Age
45
Location
Engine room, learning
XP
15,563
Country
France
When I talked about the "memcopying" I replied to kuwanger's explanation.

but I was referring specifically to
char test[5] = "test";

is test[5] initialized to 0 by the compiler, or the value is random with no default value? I've been taught it was random but maybe it changed with newer compiler's version.
is doing this good, bad, or unneeded?
char test[5] = "test\0";

I usually do this instead
char test[5] = "test";
test[5] = 0;

just by curiosity, would you have a ARM comparison between "test" and "test\0" too?
 
Last edited by Cyan,

kuwanger

Well-Known Member
Member
Joined
Jul 26, 2006
Messages
1,510
Trophies
0
XP
1,781
Country
United States
All string literals when defined end with a nul terminator. Hence when you do char test[5] = "test";, you're saying to copy 't', 'e', 's', 't', '\0' into a five char buffer named test. The point of my example is precisely that you can have multiple strings concatenated together and string functions, like printf, just keep going until they find a '\0'. In my example, that means run into the nul at the end of the copied string instead of the middle '\0' that is overwritten.

And of course, the original post had the equivalent of char test[4] = "test";, with a copy of 't', 'e', 's', 't' to a four char buffer named test which then when used with a string function keeps going until it happens upon a '\0' (or segfaults). It's one reason I (and I imagine most) people just leave out the number and the let compiler define the exact value unless there's good reason to clarify the buffer size. Or usually I do char *test ... because I don't normally modify copy of string literals like that and would rather avoid the needless copying.
 

Coto

-
Member
Joined
Jun 4, 2010
Messages
2,934
Trophies
2
XP
2,408
Country
Chile
When I talked about the "memcopying" I replied to kuwanger's explanation.

I just passed by to actually give good insight on how it's done, because I have been dealing with these kind of things for YEARS. So why hide the knowledge and instead document these findings. I thought it was a good idea. And I like good ideas. I guess you too, I guess?


but I was referring specifically to
char test[5] = "test";

is test[5] initialized to 0 by the compiler, or the value is random with no default value? I've been taught it was random but maybe it changed with newer compiler's version.
is doing this good, bad, or unneeded?
char test[5] = "test\0";

I usually do this instead
char test[5] = "test";
test[5] = 0;

just by curiosity, would you have a ARM comparison between "test" and "test\0" too?
In fact that would require to have running toolchaingenericds-gdbstub-example, trace through linker (arm9.map) file, the linear location of the function, and read stack contents from a GDB Session. I could do that.
 

Cyan

GBATemp's lurking knight
Former Staff
Joined
Oct 27, 2002
Messages
23,746
Trophies
4
Age
45
Location
Engine room, learning
XP
15,563
Country
France
of course, it's nice to share the knowledge. someone would always search for that information and will be happy to find it.
that's also the purpose of seeing ARM way of initializing a fixed size array filled with a shorter string, instead of making assumption let's just verify what it really does :)
sorry if it gives you some work to do it, and thank you for taking the time.
 

Coto

-
Member
Joined
Jun 4, 2010
Messages
2,934
Trophies
2
XP
2,408
Country
Chile
well:

2ecn5oj.png

char test1[5] = "test\0"; runs out of space for allocation, so array size must be +1 for this small test to continue. That said:

test1 and test2:
Code:
char test1[6] = "test\0";
char test2[5] = "test";


compiled and linked in the .data section, you can see it through the arm9.map output the linker generated here:
2vi56vr.png


So I will step over that address in the ToolchainGenericDS GDB Debugger + the client debugger running in Windows:
2m5m69h.png


Results:
test1: The compiler will add a null-character right after the null-character you added. Making it redundant and possibly raise bugs on string-like posix functions that do parsing operation if you rely on null-characters.

test2: The compiler will add a null-character right after the end of the written char * character.

Do note, if you use memcpy, and you copy the exact ammount of data the char array has it leads to two scenarios:

1) the char source you are copying memory from is at least test1 or test2 from the above example: will copy the null-character, thus, making the character copy standard and compatible with string-like posix functions. (no need to add a '\0' at the end)

2)the memory source is not character standard, there it is mandatory to add the '\0' at the end:
Code:
char test[11] = "hello test\0";
char test2[6] = { 0 };
  
memcpy((char*)&test2[0],(char*)&test[0],5); //hello copy, not ending character here.
test2[5] = '\0';    //add null-character because the string will lead to buffer overflow on string-like posix functions
  
printf("%s",test2);//now we are ok (:
 
Last edited by Coto,

Cyan

GBATemp's lurking knight
Former Staff
Joined
Oct 27, 2002
Messages
23,746
Trophies
4
Age
45
Location
Engine room, learning
XP
15,563
Country
France
thank you for taking the time to look into it :)
I see the compiler checks the array size and attempt to fit the terminating character automatically. it throw an error if it's not big enough to fit the data.

But, I suppose not all compilers are checking the initialized string size, or else 877 (thread's op) shouldn't have been able to compile his source.
char name[8] = "Shopping"; // shouldn't the compiler throw the same error than test1 ?


I'll edit my post on previous page to remove wrong info. Sorry I wrote it wrong.

edit: based on your compiler error screenshot, it should be test[12], or remove the \0 in the initialized data. but it's fine if it wasn't meant to compile it and just mention there's a null character at the end.
 
Last edited by Cyan,

Coto

-
Member
Joined
Jun 4, 2010
Messages
2,934
Trophies
2
XP
2,408
Country
Chile
yeah, when doing tests. There are at least three (GNU) compilers: CC, C++ and G++ and G++ takes the cake in all C++ (C++98 , C++11 syntax) standards. I mean if I were to generate objects that are at least ARM EABI compatible, then G++ is the compiler that does much more in-depth analisis, preprocessor evaluation and syntax.

In ARM stuff, the none EABI is a "generic" standard ARM ABI convention for handling standards into machine specific ARM context.


Do note, you can always use an online C compiler (which I do for testing stuff way before it goes to NintendoDS):
https://www.onlinegdb.com/online_c_compiler

and test your code in there


also I think it's a great addition to have GDB environment in DS. Now I can just debug the hell out of an exception instantly
 
General chit-chat
Help Users
  • No one is chatting at the moment.
    Psionic Roshambo @ Psionic Roshambo: https://www.youtube.com/shorts/fi6AIQoFIuk