File Sequence Hash Compression

Discussion in 'General Off-Topic Chat' started by SoraK05, Apr 5, 2012.

Apr 5, 2012
  1. SoraK05
    OP

    Member SoraK05 GBAtemp Regular

    Joined:
    Aug 24, 2007
    Messages:
    148
    Country:
    Kenya
    http://gbatemp.net/t...le-compression/

    This is what I have got. A system that can represent every file possible using the smallest amount of bytes with lots of repetition the larger the file size.

    Every file on a computer follows the file sequence. The number of the file in the file sequence.

    The 7 bytes "SoraK05"
    is this as the file sequence: 16348895550003485771

    What I have done is this:
    I imagine this process to be done imaginatively as though the computer is processing the file sequence number bit after bit.

    Cycle through the file sequence and using the first byte, cycle through 00 to FF. Every time it gets to FF and resets back to 0, the next byte increments by a bit. As this bit is representative of having gone through FF cycles, each bit is equal to 256. The cycle goes on again, and the bit increments again.
    After the second byte reaches FF, it stays at FF and the next byte starts incrementing. This will inevitably give you a lot of repetition of FFs as you go along the file sequence.
    At 1 byte, at 256 per bit combination, it would equal 65536.

    If the file is 82 34
    35 * 256 = 8960
    + 130
    = 9090

    It has passed the cycle in the first byte 35 times and has reached 130 when it reached the file sequence number.

    As FF FF it would mean that it represents the file sequence number max of 65536.
    This would be translated as this file:
    65536 / 8 (bits) / 256 (for a byte) = 32.
    FF FF represents 32 bytes exact.
    The next 256 bytes will be countered in the first byte, and then the third byte now increments a bit (representing 256).
    FF FF FF represents (65536 * 65536) / 8 / 256, where the file sequence file number 131072, which is exactly /8/256 = 64.

    FF FF FF represents 64 bytes itself in the file sequence.
    This means that each byte following the first byte can represent up to 32 bytes worth of the file sequence number.
    Continuing with this method, the more files increase, the more repetitions of FF are going to be encountered.

    So as the file size increases with this representation system here of the file sequence (like an infinite hash representation of the file sequence), there will be more and more and more FFs, which can be compressed.

    Thus, there is a very large amount of FFs the higher the files you go.

    It can be organized to the ^ of to represent all the repetitions of the FFs.

    256 FFs is the equivalent to 256 * 65536 = 16777216 / 8 / 256 / 1024 = 8 bytes.
    8 bytes worth will be 256 FFs, which can be put into one hex address of 01.
    So the file 00 01 (counter reset, excluding additional FFs for more values, as it's simply a base, and the 01 representing 01 of (256 * 65535), those 2 bytes represent 8 bytes.
    In another instance, there will be many FFs which something like winrar can compress well. Majority of the file will be FF if the program doesn't do something to compress it the most anyway.

    A full byte of the above example, i.e. 00 FF, would be 256(256 * 65536) = 4294967296, which is 2MB
    00 FF represents 2MB.

    The data can work with an infinite amount of sets of 256 FFs next to each other. When there are 256 of the set to manage the first set, a new hex represents the '3rd set'. It keeps repeating and the number of sets is infinite.

    This guarantees a generally low amount of bytes of pretty much mostly FFs, managed internally, and can be further compressed with winrar, depending on the amount of FFs in the file.
    With enough of these sets and markers for them, a few bytes can represent a very large amount of data.
    For maximum efficiency, the markers can also be compressed into sets as well. The few bytes can represent an enormous amount of data.

    The first 256 FFs, representing 32 bytes each, represents 8192 bytes, or 8KB.
    That means that 256 FF worths, which can be represented as a marker as one byte as 01, represents 8KB.
    Either 256 or 1 byte with a few header bytes and such is the equivalent of 8KB.. and on top of that, depending on the file structure, a lot of values in the file will be FF.


    Can someone help program this. I know I can figure it out with the basic programming skills I have, but I believe the pseudocode above is enough to create a working program.
     
  2. qwertymodo

    Member qwertymodo GBAtemp Advanced Fan

    Joined:
    Feb 1, 2010
    Messages:
    769
    Country:
    United States
    Hate to burst your bubble, but hashing does not work for compression. It never will. There is not enough information retained in a hash to recreate the original file. End of story.
     
  3. Blood Fetish

    Member Blood Fetish Quis custodiet ipsos custodes?

    Joined:
    Nov 3, 2002
    Messages:
    980
    Country:
    United States
    As I said in the previous thread: It is mathematically impossible to losslessly compress random data. There is a reason why every single compression algorithm relies on patterns and repetition. This will not work.
     

Share This Page