I forgot to do it for my last post but I scanned through the ARM9.bin and overlays as well- a bunch of text in there too (ASCII again and mainly wifi related but not all of it).
Anyhow nice- I have not used much corruption lately.
Yeah type 40 is a relatively new type of LZ- Golden sun had it first apparently although this does not look like it (loads of 6 byte runs of 00 for one and nothing to indicate LZ- no flags or anything beyond the initial flag as it were).
DSdecmp and
http://gbatemp.net/t274472-codec-lzss-ds-released should support it.
Anyhow I started to pull apart the alpha file after rastsan pointed me at it although I fear I am only going to half confirm some of the things you mentioned.
First value as you say is the length and appears to hold for both
Second if taken to be a pointer does appear to coincide with a shift in patterns but if you lose the initial header and start at 20 it lines up even better (pointers in DS roms have a habit of doing this).
alpha is rather clean and has a bunch of values (that count upwards C at a time- nothing unusual for something as regular as fonts/glyphs/characters though) and then padding.
talk has a bunch of values (again counting up C every time) and then random stuff dropped around and no clear shift to anything.
Anyhow getting back to business
alpha
At 320 the file stops having 00's and has something nice for 2F4
This section has 12 byte entries (or more likely 2 bytes, 2 bytes and 8 bytes)
First 2 bytes appear to be counting upwards in hex.
No idea what the second 2 do yet but you say width which works for me (values more or less work for it at a glance)
Punctuation makes this tricky so a kind of relative search to try and confirm this I think is in order
The usual suspects (ijl and on the flip side wmo and the like) would go in the order ijkl so I would expect. two small, one normal, one small although if it is a serif font that can throw things. I might also try for a comparison between alpha and talk but I will stow that for a minute.
4 is the smallest entry at 2d2 (sorry I should have stated I punted the second table as it were to another file)
It appears to be one odd character and then maybe the alphabet
One case
Code:
Readouts from my hex editor as follows
First characters are my own I added for clarity, the next is location/offset (new file remember), the next is data and the next is ASCII readout)
H 00000060 0048 000F 0F46 0000 0000 0000 .H...F......
I 0000006C 0049 0008 1274 0000 0000 0000 .I...t......
J 00000078 004A 0009 14BF 0000 0000 0000 .J..........
K 00000084 004B 000F 175B 0000 0000 0000 .K...[......
L 00000090 004C 000C 10D5 0000 0000 0000 .L..........
M 0000009C 004D 0012 16C1 0000 0000 0000 .M..........
The other case
Code:
H 000001E0 0068 000B 15CD 0000 0000 0000 .h..........
I 000001EC 0069 0005 13E5 0000 0000 0000 .i..........
J 000001F8 006A 0005 1389 0000 0000 0000 .j..........
K 00000204 006B 000B 11A4 0000 0000 0000 .k..........
L 00000210 006C 0005 131E 0000 0000 0000 .l..........
M 0000021C 006D 0011 154A 0000 0000 0000 .m...J......
N 00000228 006E 000B 13AE 0000 0000 0000 .n..........
File then has enough left over to get to Z with the W position also being large.
Something of a coincidence with the ASCII readout there- it might still be swapped cases but even so.....
Passes first blush test. Proof is obviously in the doing (I would just change lengths to be far wider and hopefully see it as a quick test) but I am far too lazy for that right now.
Also I have never done that before- linguistics/relative order to feed fuzzy width calculations.... I shall have to remember it.
Having had my fun back to work
Pointers..... very odd order but we are dealing with a truly custom font format (well I have not checked against NFTR and other known custom formats yet so truly custom might be a bit premature) on a game with at least 5 fonts at this point so I am not inclined to doubt it.
Still no collisions that I can see and nothing that immediately jumps out (one entry 13F0 and another 13F2 or something like that).
005F 000A "1982" is the longest length in alpha and seems to be in punctuation or numbers country rather than a roman character but I will come back to that later.
If I drop the as yet unknown section in a new file 1982 is way outside the file so moving back to a normal file- I have to decide whether to keep or lose the header.
Also the second table the header points at appears to count up in 4 with 4 byte entries starting at 00
alpha makes it to 05EC where talk makes it 0AB0 - talk is the bigger file as it were so that is nothing drastic.
I should probably check it against the initial table for amount of values (doing the new file thing again alpha has the last entry at 00FA where talk has the last entry finish at 017c) but my stomach is rumbling so I am out for an hour or two (just as it gets interesting I know).
All I will say is there are lot of F characters (presumably a blank colour) in the unknown section. Padding or maybe something like palette then data?