Hacking libds2a unofficial

BassAceGold

Testicles
OP
Member
Joined
Aug 14, 2006
Messages
496
Trophies
1
XP
441
Country
Canada
So while I was trying to port a library to the DSTwo, the newest zlib, I have noticed that the official libraries for the DSTwo were missing a few file handling functions.

So basically this unofficial release adds basic unistd support (ds2_unistd.h) which contains the functions - close, lseek, write and read. Very basic fcntl support has been added as well (ds2_fcntl.h) which only contains the open function (fcntl function has yet to be completed).

zlib 1.5.2 has also been included and modified for use with the sdk as well. I have currently only tested gzip compression/decompression and it seems to work fine so far.

The updated library can be found here.


Newest library update (R2-2.2)
 
  • Like
Reactions: Margen67

BassAceGold

Testicles
OP
Member
Joined
Aug 14, 2006
Messages
496
Trophies
1
XP
441
Country
Canada
Release 2!

Includes some various fixes such as:
-faster fopen times
-mkdir now works (may not have been a previous issue)
-makes the DS2's DMA copy features available

and a big one:
-More CPU clock levels!


The DMA and CPU features have been added to the folder libsrc/core and are just modifications of the files found in the
Supercard SDK 1.2 sources. This library features additions to the Supercard SDK 0.13 beta release which contains more audio buffers
and a working ds2_plug_exit() function that is broken in the official 1.2 sources.

**CPU CLOCK NOTES**
For the cpu clock levels, I've managed to get my Supercard to 456MHz (level 18). I haven't tested if it's stable and overclocking
results may differ on a per card basis. There is no guarantee that your card can achieve such results, but there is probably no
harm in trying. I also have not done any testing to the extent of heat output at such levels, and if it may be damaging to the surrounding
components or not.

New clock levels can be added, and current clock levels changed, by manipulating the pll_m_n array in the libsrc/core/ds2_cpuclock.c file

Regardless, use these levels at your own risk!

Perhaps I'll throw a program together for CPU testing so we can generate an average overclock that people can obtain on their cards.
http://filetrip.net/dl?JCqV6zF4qm
Download Here
 
  • Like
Reactions: Margen67

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
Were there any meaningful changes between 0.13 and 1.2, and is your library based more on 0.13 or on 1.2?

Would you be open to adding C compilation flags from CATSFC into your program library if they make zlib go faster at level 9 when the CPU clock is constant?

I'll try out your modifications to the CPU and DMA in CATSFC and DS2Compress experimental branches after releasing CATSFC 1.25, and I'll report results here! :) In the meanwhile, if an overclocker becomes available, I'll test it out.
 

BassAceGold

Testicles
OP
Member
Joined
Aug 14, 2006
Messages
496
Trophies
1
XP
441
Country
Canada
The library is based on 0.13. As far as changes, I think 0.13 was an after thought, it fixed somethings from a previous release of the SDK but then they only released the source for that previous version of the SDK and labeled as 1.2 (probably based on the 0.12 sources). The DMA and CPU clock stuff was code taken from libds2b sources provided in the 1.2 release of the sdk.

What C flags are there to add?
 

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
The library is based on 0.13. As far as changes, I think 0.13 was an after thought, it fixed somethings from a previous release of the SDK but then they only released the source for that previous version of the SDK and labeled as 1.2 (probably based on the 0.12 sources). The DMA and CPU clock stuff was code taken from libds2b sources provided in the 1.2 release of the sdk.

What C flags are there to add?
From CATSFC:
Code:
CFLAGS := -mips32 -mno-abicalls -fno-pic -fno-builtin \
          -fno-exceptions -ffunction-sections -mno-long-calls \
          -msoft-float -G 4 \
          -O3 -fomit-frame-pointer -fgcse-sm -fgcse-las -fgcse-after-reload \
          -fweb -fpeel-loops
You could probably add everything that's after -O3, as well as -mno-long-calls. -mno-long-calls makes every function call into a jal instruction, instead of loading the address in 2 instructions then jumping to it with jalr.

-mno-long-calls:
Code:
jal  <address of SomeZlibInternalFunction / 4>

-mlong-calls:
Code:
lui  t5, <address of SomeZlibInternalFunction, upper 16 bits>
add  t5, <address of SomeZlibInternalFunction, lower 16 bits>
jalr t5

GCSE and -fpeel-loops are for loops.

(456 MHz...) Hahahaha, wow.

Man, this is tempting, but I don't have any replacements in case mine gets damaged (neither DSTwo or DS unit).
Indeed... that is also my case. However, 456 MHz should be pretty safe as far as overclocking standards go. I wouldn't trust anything over 550 MHz, though! Plus, BAG's DSTwo is still probably in good shape after the 456 MHz test, or he wouldn't be testing anymore and posting a big red warning sign instead ;)
 

BassAceGold

Testicles
OP
Member
Joined
Aug 14, 2006
Messages
496
Trophies
1
XP
441
Country
Canada
The worst I find with overclocking is that the CPU will just stall if it doesn't work; it seem pretty resilient. I don't think the NDS slot can output enough power to overclock the cpu to a state where it could do harm.

From CATSFC:
Code:
CFLAGS := -mips32 -mno-abicalls -fno-pic -fno-builtin \
          -fno-exceptions -ffunction-sections -mno-long-calls \
          -msoft-float -G 4 \
          -O3 -fomit-frame-pointer -fgcse-sm -fgcse-las -fgcse-after-reload \
          -fweb -fpeel-loops
You could probably add everything that's after -O3, as well as -mno-long-calls. -mno-long-calls makes every function call into a jal instruction, instead of loading the address in 2 instructions then jumping to it with jalr.
;)

Adding em now.
 

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
Please pull from my CATSFC experimental branch: https://github.com/ShadauxCat/CATSFC/tree/experimental

and compare with master: https://github.com/ShadauxCat/CATSFC/tree/master

The experimental branch goes to /EXPSFC on your card, and it compiles to expsfc.plg, so you don't need to worry about overwriting constantly.

It looks like the ds2_setCPULevel function delays for an entire second before working, and the video is way choppier than usual at "CPU speed 5" (denoted as 396 MHz in /Options). Could it be that mdelay, udelay and getSysTime are all way slower?
 

BassAceGold

Testicles
OP
Member
Joined
Aug 14, 2006
Messages
496
Trophies
1
XP
441
Country
Canada
I had made a build of CATSFC with the new cpu level function, and a new build of BAGSFC with it. BAGSFC seemed to accept it much better than CATSFC does. In my own personal programs, I haven't had any issues with getSysTime being slower--if that were the case, I wouldn't be measuring better performance with the overclock.

Maybe mdelay and udelay are slower. I generally try to avoid using those as it is a waste of power, but if they are necessary for stability (because the SDK sucks like that), you could try using __dcache_writeback_all(); instead. This just makes sure all changes in the DS2's cpu cache are actually written back to the memory and not dropped, which has caused me quite a bit of problems before.

Ok, here is whats happening:
These features of the SDK regarding the hardware (cpu clocks and dma) aren't accessible by default, that is, they aren't supposed to be accessed by anyone since it was closed source to begin with. Any code from this portion can be accessed if you know the name of the function you want and can define an extern prototype for it. Through this way, you would be using the native functions already provided in the library (libds2b). However, since I have made changes and have not compiled these changes to the original library (libds2b, but they are "redefined" in libds2a), there are actually two versions of some code that can be accessed(native vs my changes).

The cpu functions happen to store some values in certain variables, which could not be accessed through the extern method since they are defined locally in the file. So these had to be copied/redefined into the new code. The udelay function depends on one of these variables, but since we are manipulating the copied version with ds2_setCPULevel, udelay operates incorrectly using the native version of said variable.

So to fix udelay/mdelay, we need to make our own variants of the functions that use the variable that is updated by my copy of the detect_clock() function. So basically, we just need to re-add them to the ds2_cpuclock file, and call the functions something else to avoid conflicts with the native code.

Code:
void udelayX(unsigned int usec)
{
    unsigned int i = usec * (_iclk / 2000000);
 
    __asm__ __volatile__ (
        "\t.set noreorder\n"
        "1:\n\t"
        "bne\t%0, $0, 1b\n\t"
        "addi\t%0, %0, -1\n\t"
        ".set reorder\n"
        : "=r" (i)
        : "0" (i)
    );
}
 
void mdelayX(unsigned int msec)
{
    int i;
    for(i=0;i<msec;i++)
    {
        udelay(1000);
    }
}

And then define those in a place where they can be found by the programmer to use in their programs.


I shall have a fixed version of the library up in a bit with these changes.




Alright, here is the update:
http://filetrip.net/dl?CCv0QSb7k8

The functions are now:
ds2_udelay(...) and
ds2_mdelay(...)
 

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
Thanks; I'll deal with those later today or tomorrow. I'll also modify any remaining references to ds2_setCPUclocklevel, if any.

Can I assume that getSysTime() is correct, by the way?

edit: As for mdelay and udelay, I use them for ensuring that ds2_setBacklight doesn't crash (100 milliseconds does it) and formerly for synchronisation of frame times, but I believe some speed-syncing code in Snes9x itself was retrofitted to use mdelay somewhere.

(in 1.25 I call S9xProcessSound instead of udelay just in case I'm missing 11 milliseconds already, to avoid crackling)

However, I think the internal communication functions use udelay too. I'll need to look at that.
 

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
Please pull again from my experimental branch: https://github.com/ShadauxCat/CATSFC/tree/experimental

Commit c6f98980230ace9efc950525eddf5efce417c4f3 isolates a crash/freeze in function fat_getDiskSpaceInfo.
* Compile, run, go into Options and observe that the emulator has frozen.
* Add -DDISABLE_FREE_SPACE to the Makefile's DEFS variable.
* Compile, run, go into Options and observe that the emulator continues working, with a placeholder ??? for card capacity.

The ds2_mdelay function works well, though! It's not uber-slow anymore.
 

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
Addendum: Nope, it doesn't work so well. Starting at 420 MHz, and maybe lower, I touch the lower screen to return to the menu and it doesn't even get to setting the backlight, so there must be some kind of freeze, crash or infinite loop caused by overflow in ds2_setCPULevel(0) from level 15+.
 

BassAceGold

Testicles
OP
Member
Joined
Aug 14, 2006
Messages
496
Trophies
1
XP
441
Country
Canada
Honestly, that menu code is a mess, and the DSTwo is such an unstable system anyway, so I wouldn't put all these problems as simply a result of overclocking. The overclocking might just be expressing some problems you may eventually run into in the future.

However, it does suck that these problems even exist at all. For now I shall experiment with overclocking in my own programs for I have had some success with it myself.
 

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
I fixed a load of crash bugs in the menu that were expressed in CATSFC 1.19 at 394 MHz (= 13); the previous menu was not set to NULL sometimes before calling choose_menu(&main_menu), so an inexistent end_function was called for it. Now those crashes shouldn't happen anymore.

But what I do is set the CPU clock back to 60 MHz (= 0) in menus. My rationale is that the user won't need to run a higher clock in the menu, waste every cycle anyway and thus waste battery life. At 394 MHz this is stable, at 408 MHz it appears to be stable so far, but at 420 MHz it runs up to 5 minutes before crashing.

Do you lower the CPU to 60 MHz sometimes in your programs?
 

BassAceGold

Testicles
OP
Member
Joined
Aug 14, 2006
Messages
496
Trophies
1
XP
441
Country
Canada
There are a few cpu levels on the lower frequencies that don't even work for me no matter what, so I try to keep the cpu on levels that do work. Those stable clocks usually don't cause any problems when switching between them. Can't say I've ever had luck with levels 0, 2, 5, and 7.
 

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
In commit 0568880af3ce49ba60d699f3ceeb89a510823040 I made the emulator go to 120 MHz (= 1) instead of 60 MHz, and made it easier to change the "low" level used by the menu to save on power usage: everything is funneled through LowFrequencyCPU().

With that commit, 420 MHz (= 15) is stable on my cart, whereas it was not before.

438 MHz (= 16) crashes when touching the lower screen after a few minutes, and seems slower than 420 MHz;
444 MHz (= 17) crashes when touching the lower screen after a minute, or on its own at some point during emulation;
456 MHz (= 18) crashes before the first frame is emitted after exiting the menu.
 
  • Like
Reactions: Margen67

Deleted member 319809

MAH BOI/GURL
Member
Joined
Dec 22, 2012
Messages
900
Trophies
0
XP
461
Country
Canada
Alright, I'm still encountering roadblocks and glitches with the release 2 of this libds2a.a, such as waiting 1 entire second before clearing the menu screen (it's not after disabling the lower backlight this time) and odd glitches in DMA near the end of a rendered screen. I can't continue using this; if you want some code to chew on, though, feel free to make pull requests on the experimental branch.

Pull from experimental again and load Yoshi's Cookie or Chrono Trigger, and you might see partial text being rendered.

EDIT: DMA was bad because the last bytes to be written to the screen were still only in the data cache. Calling __dcache_writeback_all() before DMA fixed it.

Here's how the Yoshi's Cookie title screen looked without it:
yoshis-cookie-with-dma.png
 

BassAceGold

Testicles
OP
Member
Joined
Aug 14, 2006
Messages
496
Trophies
1
XP
441
Country
Canada
I'm curious now, do you still have issues with the newest release of this lib now that you have your sdk problems fixed? I just compiled your newest experimental build of CATSFC with it and haven't noticed any issues yet (apart from unstable clock speeds in the very upper end). I'll keep testing things and we'll see what happens.
 

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
    BigOnYa @ BigOnYa: Pepsi owns Taco Bell, and Pizza Hut, so that made since Pizza hut pizza was sold in Taco bells.