nsZip - NSP compressor/decompressor to reduce storage

gabest · Sep 21, 2019

I don't really use Discord, but created a pull request. Visual Studio upgraded a few version numbers in the project file, hope it's not an issue. I saw LZMA was planned, is it for the whole file? That would be similar to 7zipping it with non-compressed ncas inside, which I'm doing right now.

gabest · Sep 28, 2019

Just an update. 3549 to 2114 GB after finishing. It was one of the collections from /r/roms on reddit if anyone is curious.

popy · Sep 28, 2019

That sounds really promising /
Thx a lot.

Is it possible to get a test version?

Thx

Gesendet von meinem ONEPLUS A6013 mit Tapatalk

jkjj · Oct 1, 2019

full xci compatibility will be great, very impatient with this.
i cannot wait to see where this goes and if it will indeed be incorporated into major projects. really really nice.
thank you

blawar · Oct 6, 2019

jkjj said:
full xci compatibility will be great, very impatient with this.
i cannot wait to see where this goes and if it will indeed be incorporated into major projects. really really nice.
thank you

I implemented this idea into Tinfoil 5.00, though I changed the format:

https://github.com/blawar/nsz

Halo69 · Oct 7, 2019

I compressed an nsp file, i see now the format changed to nspz. So can that format be installed? If so, how?

SaulFemm · Oct 7, 2019

@nicoboss Have you seen https://github.com/blawar/nsz ? Do you feel there is still a use-case for nsZip now that this exists? I usually don't use blawar apps, but this one is actually open source and seems to be pretty complete.

proffk · Oct 7, 2019

Are NZS installable now like a nsp since SX installer & blawar tinfoil support it now.

Taorn · Oct 7, 2019

SaulFemm said:
@nicoboss Have you seen https://github.com/blawar/nsz ? Do you feel there is still a use-case for nsZip now that this exists? I usually don't use blawar apps, but this one is actually open source and seems to be pretty complete.

I think this quite an odd way of looking at it. Why not have multiples projects that share a similar purpose? It’s not good for users when they become depend on one dev alone.

Halo69 · Oct 7, 2019

Can someone answer me if NSPZ is the same as NSZ? I see nut server added support for NSZ compressed formats. Cause im using latest nut server and tinfoil not recognizing the NSPZ file i have.

nicoboss · Oct 7, 2019

blawar said:
I implemented this idea into Tinfoil 5.00, though I changed the format:

https://github.com/blawar/nsz

I highly appreciate all the work and effort that went into @blawar ’s project. I really like that you open sourced the compressor/decompressor and even gave credit for the idea. Like nsZip blawar's project is also based on the idea of Zstandard compressing the decrypted NCAs and packing them back into a PSF0 container. What he implemented is the highly request but never implemented NSZ type 0 solid compression format. To reduce the complexity and work required to implement such a file format he decided not to followed my proposed specifications and instead designed his own format. It fits very well the purpose intended for NSZ type 0 which basically is saving as much storage/bandwidth as possible by keeping the compression ratio as low as possible while still being able to decompress very fast.

With an already working homebrew implementation he defiantly convinced me to adopt it into nsZip. I will even delay the v2.0 release in order for this to make it in. Now you might ask what’s with NSPZ and XCIZ and if I will continue developing them. I will for sure continue with nsZip! NSZ type 1 uses a completely different approach how to compress a game. Instead of compressing everything as a single pice it compresses it in small chunk trading of compression ratio for random read access. This will allow emulators and maybe even real hardware to be able to play compressed games in the future without having to decompress it first. Blawar’s file format will never be able to achieve this as by design his format will never be able to offer the required random read access. While NSPZ/XCIZ for emulators will probably be reality in the foreseeable future when nsZip allows virtual mounting, nobody can say when and if playing compressed games on real hardware will be implemented while with blawar’s file format you already have something that can be used on real hardware. For example, you could just store all the games you ever want to play compressed on your SD card and install them to system storage as soon you want to play a certain game. If you don’t care about emulators and regularly need to install them on real hardware go with Blawar’s file format however if you want to store them to preserve them for the future or are more interested in playing your dumps on emulator NSPZ/XCIZ is probably the way to go.

gabest said:
I was testing nsZip in the last couple of days. The block based compression is nice, but honestly, if you want to save the most space, you have to keep the plain, decompressed ncas (forked the project on github) and use winrar or 7zip with the highest compression settings. Went through a terabyte of my collection (A-H) and currently at 60% compression ratio. The source is a mix of zipped xci and nsp.

I guess you will love blawar's file format. It's exactly what you wanted.

jkjj said:
full xci compatibility will be great, very impatient with this.
i cannot wait to see where this goes and if it will indeed be incorporated into major projects. really really nice.
thank you

This still has the highest priority and will be implemented as soon as time allows me to. Most likely in the next 2 weeks. I will poste a preview version here as soon as it works. Sorry university fills up so much of my time.

Halo69 said:
I compressed an nsp file, i see now the format changed to nspz. So can that format be installed? If so, how?

proffk said:
Are NZS installable now like a nsp since SX installer & blawar tinfoil support it now.

Halo69 said:
Can someone answer me if NSPZ is the same as NSZ? I see nut server added support for NSZ compressed formats. Cause im using latest nut server and tinfoil not recognizing the NSPZ file i have.

Sorry the current version of nsZip hasn't implemented blawar's format yet. I first heared of it like a day ago so please give me some time to implement it. Untill then please use blawar's open source python script provided under https://github.com/blawar/nsz. I know it's currently all very confusing. I was absolutely not prepared for this and it having simular names and fileendings also adds to the confusion. I will do my best to make things more clear in the next version of nsZip. I might even change some file endings to further reduce confussion.

blawar · Oct 8, 2019

I really wanted to use nsZip @nicoboss I did not want to waste time developing a toolchain around a custom format, but I had 3 mandates that made it not an option for me:

1) The container must be a normal NSP file, so it can be read using existing tools and stream seeked easily (my installer is mostly a streamer, it has random IO all over the NSP).

2) cnmt and control NCA's must not be compressed to not break existing metadata tools (and there is no need too).

3) It had to be as simple as possible to convince other homebrew devs to implement it. That is why I put redundant information in the header to save other developers from having to parse the various nintendo formats. The reason homebrew devs did not adopt nsZip is simply because it was too complex and there were no libraries to use it provided in c++ (homebrew devs do not use c#).

If you switch the compression to the NCA level, do not modify cnmt's or NSP, then I can likely implement your 2.0 spec as well. I highly recommend that you do not mess with NCA headers.

edit: since you are concerned about random access for yuzu and possibly booting,. I recommend you ditch supporting multiple compression algos, just support zstandard. Then at the beginning of the stream, you can have an optional index structure that has entries that mark the deflated offsets to the beginning frame offset. Then I could just skip that index while installing, and you would have yuzu support.

even without the index, the emulators can build it dynamically by reading the first frame's header, and doing a linear scan (jumping from frame header to frame header) until it can build its own index. yuzu could even cache this index themselves, saving the spec the complexity involved in maintaining it. though it is trivial for me to skip index sections in the files.

you would likely want an optional parameter (specified at compress time) for maximum decompressed chunk size, to avoid having to deflate large amounts of memory to seek in certain high compression ratio situations.

nicoboss · Oct 8, 2019

blawar said:
I really wanted to use nsZip @nicoboss I did not want to waste time developing a toolchain around a custom format, but I had 3 mandates that made it not an option for me:

1) The container must be a normal NSP file, so it can be read using existing tools and stream seeked easily (my installer is mostly a streamer, it has random IO all over the NSP).

2) cnmt and control NCA's must not be compressed to not break existing metadata tools (and there is no need too).

3) It had to be as simple as possible to convince other homebrew devs to implement it. That is why I put redundant information in the header to save other developers from having to parse the various nintendo formats. The reason homebrew devs did not adopt nsZip is simply because it was too complex and there were no libraries to use it provided in c++ (homebrew devs do not use c#).

If you switch the compression to the NCA level, do not modify cnmt's or NSP, then I can likely implement your 2.0 spec as well. I highly recommend that you do not mess with NCA headers.

edit: since you are concerned about random access for yuzu and possibly booting,. I recommend you ditch supporting multiple compression algos, just support zstandard. Then at the beginning of the stream, you can have an optional index structure that has entries that mark the deflated offsets to the beginning frame offset. Then I could just skip that index while installing, and you would have yuzu support.

even without the index, the emulators can build it dynamically by reading the first frame's header, and doing a linear scan (jumping from frame header to frame header) until it can build its own index. yuzu could even cache this index themselves, saving the spec the complexity involved in maintaining it. though it is trivial for me to skip index sections in the files.

you would likely want an optional parameter (specified at compress time) for maximum decompressed chunk size, to avoid having to deflate large amounts of memory to seek in certain high compression ratio situations.

I fully agree with your decision to simplify the file format. I really like your file format. It would be awesome if we could implement optional random read access into NCZ files. That way we would have a single standardized file format for all compressed Nintendo switch games.

Would optional metadata specifying the layout of compressed chunks like this be fine for you:
[unit32] Decompressed Block Size
[unit32] Amount of Blocks
[unit32]*[Amount of Blocks] Compressed Block Size
Concatenated compressed blocks after header

You wouldn't even have to care much about this metadata if you just wanting to stream the whole file. Just start streaming with the first block and if it ends stream the next one.
As compression algorithm only zStandard will be used. Is it fine for you if I store blocks with a compressed size equals or larger then their uncompressed size uncompressed? This would be indicated by having a compressed block size equals decompressed block size inside the header. Does your homebrew need NCAs containing fragments files? Currently nsZip is trimming them to their header so they can be reconstructed but I could leave them untouched if required what however would significantly increase the size of updates containing fragments. For your points nsZip already produces PFS0 files. I don't care about not compressing control NCA's. I also like your file format so I have no problems making nsZip fitting your file formats requirements. Please let me know t me know what you about my sigestions and feel free to suggest any chenges.

blawar · Oct 8, 2019

nicoboss said:
I fully agree with your decision to simplify the file format. I really like your file format. It would be awesome if we could implement optional random read access into NCZ files. That way we would have a single standardized file format for all compressed Nintendo switch games.

Would optional metadata specifying the layout of compressed chunks like this be fine for you:
[unit32] Decompressed Block Size
[unit32] Amount of Blocks
[unit32]*[Amount of Blocks] Compressed Block Size
Concatenated compressed blocks after header

You wouldn't even have to care much about this metadata if you just wanting to stream the whole file. Just start streaming with the first block and if it ends stream the next one.
As compression algorithm only zStandard will be used. Is it fine for you if I store blocks with a compressed size equals or larger then their uncompressed size uncompressed? This would be indicated by having a compressed block size equals decompressed block size inside the header. Does your homebrew need NCAs containing fragments files? Currently nsZip is trimming them to their header so they can be reconstructed but I could leave them untouched if required what however would significantly increase the size of updates containing fragments. For your points nsZip already produces PFS0 files. I don't care about not compressing control NCA's. I also like your file format so I have no problems making nsZip fitting your file formats requirements. Please let me know t me know what you about my sigestions and feel free to suggest any chenges.

How does this look?

Code:

class NszIndexes
{
public:
    class Index
    {
    public:
        u64 compressedOffset; // relative to beginning of NCA
        u64 decompressedOffset; // relative to beginning of NCA
        u32 compressedSize; // size of compressed chunk
        u32 decompressedSize; // size of decompressed chunk
    };

    u64 magic;
    u64 size; // total size of the entire structure: sizeof(magic) + ssizeof(size) + sizeof(zstandard compressed stream of indexes)

    /*
    Indexes must be sorted ascending, but they do not need to account for every zstandard frame.
    */
    Index indexes[]; // zstandard compressed buffer.  number of entries = sizeof(decompressed stream) / sizeof(Index)
};

blawar · Oct 8, 2019

blawar said:

How does this look?

Code:

class NszIndexes
{
public:
    class Index
    {
    public:
        u64 compressedOffset; // relative to beginning of NCA
        u64 decompressedOffset; // relative to beginning of NCA
        u32 compressedSize; // size of compressed chunk
        u32 decompressedSize; // size of decompressed chunk
    };

    u64 magic;
    u64 size; // total size of the entire structure: sizeof(magic) + ssizeof(size) + sizeof(zstandard compressed stream of indexes)

    /*
    Indexes must be sorted ascending, but they do not need to account for every zstandard frame.
    */
    Index indexes[]; // zstandard compressed buffer.  number of entries = sizeof(decompressed stream) / sizeof(Index)
};

What do you think about storing the index outside of the ncz, but inside of the nsp? a lot of people are converting lots of stuff right now, and if we build the index file outside of it, it would be trivial to add the index files to the nsp afterwards at any time to existing nsz. especially since i have 4kb padded nsp headers in the nsz so files can be added without repacking the NSP. This would maintain backwards compatibility.

so 55555.ncz would have a corresponding 55555.ncz.index. Could even throw some optional metadata in there.

nicoboss · Oct 8, 2019

blawar said:

How does this look?

Code:

class NszIndexes
{
public:
    class Index
    {
    public:
        u64 compressedOffset; // relative to beginning of NCA
        u64 decompressedOffset; // relative to beginning of NCA
        u32 compressedSize; // size of compressed chunk
        u32 decompressedSize; // size of decompressed chunk
    };

    u64 magic;
    u64 size; // total size of the entire structure: sizeof(magic) + ssizeof(size) + sizeof(zstandard compressed stream of indexes)

    /*
    Indexes must be sorted ascending, but they do not need to account for every zstandard frame.
    */
    Index indexes[]; // zstandard compressed buffer.  number of entries = sizeof(decompressed stream) / sizeof(Index)
};

Just storing the decompressed block size, number of blocks and an array of compressedSize would be enough. We need to keep the decompressed block size constant as otherwise random read access would require binary search. The decompressedOffset can just be calculated like "startIndex + decompressed block Size * blockNr" and the decompressedOffset is "startIndex + sum(compressedSize)". You can store such data, it just seems quite useless as they can be calculated so easily.

blawar said:
What do you think about storing the index outside of the ncz, but inside of the nsp? a lot of people are converting lots of stuff right now, and if we build the index file outside of it, it would be trivial to add the index files to the nsp afterwards at any time to existing nsz. especially since i have 4kb padded nsp headers in the nsz so files can be added without repacking the NSP. This would maintain backwards compatibility.

so 55555.ncz would have a corresponding 55555.ncz.index. Could even throw some optional metadata in there.

NSZ files built with the current version of your tool will all use solid compression and so don't need this metadata. If they want random read access they have to recompress the whole file anyways. So we should just implement this metadata optionally to keep backwards compatibility. If they aren’t there it's solid compression. Putting some metadata into a separate file inside the PFS0 container will add additional complexity going against our goal of making the format as easy as possible.

blawar · Oct 8, 2019

nicoboss said:
Just storing the decompressed block size, number of blocks and an array of compressedSize would be enough. We need to keep the decompressed block size constant as otherwise random read access would require binary search. The decompressedOffset can just be calculated like "startIndex + decompressed block Size * blockNr" and the decompressedOffset is "startIndex + sum(compressedSize)". You can store such data, it just seems quite useless as they can be calculated so easily.

NSZ files built with the current version of your tool will all use solid compression and so don't need this metadata. If they want random read access they have to recompress the whole file anyways. So we should just implement this metadata optionally to keep backwards compatibility. If they aren’t there it's solid compression. Putting some metadata into a separate file inside the PFS0 container will add additional complexity going against our goal of making the format as easy as possible.

The entire index will fit into memory so there isn’t a reason to seek it’on disk, so it can be compressed.

You are going to want to do a binary search of the index whether it’s decompressed or not.

I will do some more research on zstandard, but i thought you could still seek with some work if you knew where the frames were

edit:

You can also split a file in small chunks, train a dictionary on the chunks and compress/decompress them independently. You get random-ish access while keeping relatively good compression ratio.

Also, I meant backwards compatibility with not breaking existing clients. I do not think its more complex to store the index in another file, however if seeking cannot be done with the current files then you are correct there is no additional harm to breaking backwards compatibility.

nicoboss · Oct 8, 2019

blawar said:
The entire index will fit into memory so there isn’t a reason to seek it’on disk, so it can be compressed.

You are going to want to do a binary search of the index whether it’s decompressed or not.

I will do some more research on zstandard, but i thought you could still seek with some work if you knew where the frames were

edit:

You can also split a file in small chunks, train a dictionary on the chunks and compress/decompress them independently. You get random-ish access while keeping relatively good compression ratio.

Also, I meant backwards compatibility with not breaking existing clients. I do not think its more complex to store the index in another file, however if seeking cannot be done with the current files then you are correct there is no additional harm to breaking backwards compatibility.

What do you think of an implementation like this: https://github.com/nicoboss/nsz/commit/46acba679c37bb72276bf2af12b48883ede23c4c
nsZip and the above linked commit are exactly following the approach you mentioned above. The file will be spitted into small chunks with a constant decompressed size so random read access is possible in constant run time complexity. Give feedback what you think about my implementation and once we agree I will also implement it into decompression and open a pull request. The current implementation is more to show you the concept and might contain minor mistakes like me forgeting to flush.

Halo69 · Oct 8, 2019

nicoboss said:
I highly appreciate all the work and effort that went into @blawar ’s project. I really like that you open sourced the compressor/decompressor and even gave credit for the idea. Like nsZip blawar's project is also based on the idea of Zstandard compressing the decrypted NCAs and packing them back into a PSF0 container. What he implemented is the highly request but never implemented NSZ type 0 solid compression format. To reduce the complexity and work required to implement such a file format he decided not to followed my proposed specifications and instead designed his own format. It fits very well the purpose intended for NSZ type 0 which basically is saving as much storage/bandwidth as possible by keeping the compression ratio as low as possible while still being able to decompress very fast.

With an already working homebrew implementation he defiantly convinced me to adopt it into nsZip. I will even delay the v2.0 release in order for this to make it in. Now you might ask what’s with NSPZ and XCIZ and if I will continue developing them. I will for sure continue with nsZip! NSZ type 1 uses a completely different approach how to compress a game. Instead of compressing everything as a single pice it compresses it in small chunk trading of compression ratio for random read access. This will allow emulators and maybe even real hardware to be able to play compressed games in the future without having to decompress it first. Blawar’s file format will never be able to achieve this as by design his format will never be able to offer the required random read access. While NSPZ/XCIZ for emulators will probably be reality in the foreseeable future when nsZip allows virtual mounting, nobody can say when and if playing compressed games on real hardware will be implemented while with blawar’s file format you already have something that can be used on real hardware. For example, you could just store all the games you ever want to play compressed on your SD card and install them to system storage as soon you want to play a certain game. If you don’t care about emulators and regularly need to install them on real hardware go with Blawar’s file format however if you want to store them to preserve them for the future or are more interested in playing your dumps on emulator NSPZ/XCIZ is probably the way to go.

I guess you will love blawar's file format. It's exactly what you wanted.

This still has the highest priority and will be implemented as soon as time allows me to. Most likely in the next 2 weeks. I will poste a preview version here as soon as it works. Sorry university fills up so much of my time.

Sorry the current version of nsZip hasn't implemented blawar's format yet. I first heared of it like a day ago so please give me some time to implement it. Untill then please use blawar's open source python script provided under https://github.com/blawar/nsz. I know it's currently all very confusing. I was absolutely not prepared for this and it having simular names and fileendings also adds to the confusion. I will do my best to make things more clear in the next version of nsZip. I might even change some file endings to further reduce confussion.

Jaja i went to that link and to be honest i have no idea what i'm looking at, i read the (readme) and i see "requirements" but can't find any instructions on how exactly its done. Unless i'm looking at the wrong thing.

blawar · Oct 8, 2019

nicoboss said:
What do you think of an implementation like this: https://github.com/nicoboss/nsz/commit/46acba679c37bb72276bf2af12b48883ede23c4c
nsZip and the above linked commit are exactly following the approach you mentioned above. The file will be spitted into small chunks with a constant decompressed size so random read access is possible in constant run time complexity. Give feedback what you think about my implementation and once we agree I will also implement it into decompression and open a pull request. The current implementation is more to show you the concept and might contain minor mistakes like me forgeting to flush.

I'm going to go through my code afterwork, to see if extra data at the very end will break clients. I know it will issue warnings, but it may still be successful. If so, we should move it to the very end of the file, and have a footer rather than a header so it can be read easily and existing clients can use the new format.

I'm digging through the zstandard protocol, it is possible to seek with a single flat stream, but it is indeed a pain in the butt. You are correct that it is probably best from a complexity standpoint to only support seeking with chunked content.

Are you sure you only want to support fixed size chunks? I do not care about seeking, so I will defer to your judgement call on that. I just think variable sized chunks would improve compression ratios at the expense of only a little added complexity. A Binary search would be fast.

Member

Member

Well-Known Member

Well-Known Member

Developer

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Developer

Well-Known Member

Developer

Developer

Well-Known Member

Developer

Well-Known Member

Well-Known Member

Developer

Similar threads

Popular threads in this forum