File Sequence Hash Compression

SoraK05 · Apr 5, 2012

http://gbatemp.net/t...le-compression/

This is what I have got. A system that can represent every file possible using the smallest amount of bytes with lots of repetition the larger the file size.

Every file on a computer follows the file sequence. The number of the file in the file sequence.

The 7 bytes "SoraK05"
is this as the file sequence: 16348895550003485771

What I have done is this:
I imagine this process to be done imaginatively as though the computer is processing the file sequence number bit after bit.

Cycle through the file sequence and using the first byte, cycle through 00 to FF. Every time it gets to FF and resets back to 0, the next byte increments by a bit. As this bit is representative of having gone through FF cycles, each bit is equal to 256. The cycle goes on again, and the bit increments again.
After the second byte reaches FF, it stays at FF and the next byte starts incrementing. This will inevitably give you a lot of repetition of FFs as you go along the file sequence.
At 1 byte, at 256 per bit combination, it would equal 65536.

If the file is 82 34
35 * 256 = 8960
+ 130
= 9090

It has passed the cycle in the first byte 35 times and has reached 130 when it reached the file sequence number.

As FF FF it would mean that it represents the file sequence number max of 65536.
This would be translated as this file:
65536 / 8 (bits) / 256 (for a byte) = 32.
FF FF represents 32 bytes exact.
The next 256 bytes will be countered in the first byte, and then the third byte now increments a bit (representing 256).
FF FF FF represents (65536 * 65536) / 8 / 256, where the file sequence file number 131072, which is exactly /8/256 = 64.

FF FF FF represents 64 bytes itself in the file sequence.
This means that each byte following the first byte can represent up to 32 bytes worth of the file sequence number.
Continuing with this method, the more files increase, the more repetitions of FF are going to be encountered.

So as the file size increases with this representation system here of the file sequence (like an infinite hash representation of the file sequence), there will be more and more and more FFs, which can be compressed.

Thus, there is a very large amount of FFs the higher the files you go.

It can be organized to the ^ of to represent all the repetitions of the FFs.

256 FFs is the equivalent to 256 * 65536 = 16777216 / 8 / 256 / 1024 = 8 bytes.
8 bytes worth will be 256 FFs, which can be put into one hex address of 01.
So the file 00 01 (counter reset, excluding additional FFs for more values, as it's simply a base, and the 01 representing 01 of (256 * 65535), those 2 bytes represent 8 bytes.
In another instance, there will be many FFs which something like winrar can compress well. Majority of the file will be FF if the program doesn't do something to compress it the most anyway.

A full byte of the above example, i.e. 00 FF, would be 256(256 * 65536) = 4294967296, which is 2MB
00 FF represents 2MB.

The data can work with an infinite amount of sets of 256 FFs next to each other. When there are 256 of the set to manage the first set, a new hex represents the '3rd set'. It keeps repeating and the number of sets is infinite.

This guarantees a generally low amount of bytes of pretty much mostly FFs, managed internally, and can be further compressed with winrar, depending on the amount of FFs in the file.
With enough of these sets and markers for them, a few bytes can represent a very large amount of data.
For maximum efficiency, the markers can also be compressed into sets as well. The few bytes can represent an enormous amount of data.

The first 256 FFs, representing 32 bytes each, represents 8192 bytes, or 8KB.
That means that 256 FF worths, which can be represented as a marker as one byte as 01, represents 8KB.
Either 256 or 1 byte with a few header bytes and such is the equivalent of 8KB.. and on top of that, depending on the file structure, a lot of values in the file will be FF.

Can someone help program this. I know I can figure it out with the basic programming skills I have, but I believe the pseudocode above is enough to create a working program.

qwertymodo · Apr 5, 2012

Hate to burst your bubble, but hashing does not work for compression. It never will. There is not enough information retained in a hash to recreate the original file. End of story.

Blood Fetish · Apr 6, 2012

As I said in the previous thread: It is mathematically impossible to losslessly compress random data. There is a reason why every single compression algorithm relies on patterns and repetition. This will not work.

File Sequence Hash Compression

Well-Known Member

Well-Known Member

Quis custodiet ipsos custodes?

Similar threads

Popular threads in this forum