Another attempt at file compression :S

Discussion in 'General Off-Topic Chat' started by SoraK05, May 15, 2012.

May 15, 2012
  1. SoraK05
    OP

    Member SoraK05 GBAtemp Regular

    Joined:
    Aug 24, 2007
    Messages:
    148
    Country:
    Kenya
    An idea came to me about a file having what I would call a 'magic number'.

    Let's take these 5 bytes:
    10 30 54 62 81

    To calculate the magic number, you add the first byte, subtract the second, add the third, subtract the fourth, and so on.
    It keeps adding/subtracting until it gets to the last byte, and the result is the magic number.

    +10 - 30 +54 -62 +81
    =53
    53 is the magic number for those 5 bytes.

    I believe no combination of those 5 bytes except that one will produce that magic number.
    [EDIT was wrong about that.. :)]

    To take this idea further, perhaps the amount of each byte can be recorded:
    i.e. 10 00s, 12 01s, 5 02s, 4 03s could look like this:

    10 12 05 04

    These 4 bytes represent the 31 bytes here.

    The idea is to shuffle those known bytes, calculating each combination, until it reaches the magic number.

    I may be wrong, but I believe the likelihood of having the magic number repeating in those combinations is low.

    On top of that, a hash can be saved of the file, so when a magic number is matching, it verifies the contents with the hash.


    I presume the likelihood of getting a matching magic number is generally low, and having a string that matches the magic number and hash is even lower.

    It may take a really long time though :S



    EDIT
    Just realized that there are at least 11 other matching magic numbers to those 5 bytes :S
    10 - 30 + 81 - 62 + 54 = 53
    10 - 62 + 54 - 30 + 81 = 53
    10 - 62 + 81 - 30 + 54 = 53
    54 - 30 + 10 - 62 + 81 = 53
    54 - 30 + 81 - 62 + 10 = 53
    54 - 62 + 10 - 30 + 81 = 53
    54 - 62 + 81 - 30 + 10 = 53
    81 - 30 + 10 - 62 + 54 = 53
    81 - 30 + 54 - 62 + 10 = 53
    81 - 62 + 10 - 30 + 54 = 53
    81 - 62 + 54 - 30 + 10 = 53

    It may not be as few as I thought.. :S

    Trying to confirm if the number of matching magic number and hashes at the same time is low enough to record as its own 'combination value' to make it worth it as a file smaller than the original one..


    EDIT
    A friend of mine gave a mathematical calculation to show that every 81 bytes + has more than one matching magic number and SHA1 at the same time, and the amount of matches grows exponentially the more bytes after that.
    So, it seems the magic number + SHA1 hash combination (i.e. 20 bytes) may or may not have a collision for every 80 bytes.
     
    1 person likes this.
  2. FAST6191

    Reporter FAST6191 Techromancer

    pip
    Joined:
    Nov 21, 2005
    Messages:
    21,743
    Country:
    United Kingdom
    What you described is a minor tweak on bytesum aka second least advanced method of making a hash/checksum (the easiest being parity).

    It is slightly better than your last idea of using a cryptographic grade hash (which by definition should not be able to be used for searching and data reconstruction purposes owing to the cascade effect beyond actually checking you got it right which as was mentioned in your previous thread is still not ideal) and brute force but you still seem to be heading towards/slowly reinventing the idea of data recovery parity (not to be confused with the basic concept of parity) as seen in the likes of the higher levels of RAID, PAR2 files and to a lesser extent in ECC ram.
    Here each part of the file would be sampled more than necessary (oversampled) to recreate the file (you can define a straight line with two points but get four points and if one is wrong you still have the line) usually via something like a Fourier transform and then you can stick your processing power on recreating the file.

    I am pretty much echoing the last thread though so I will stop there.
     
  3. Blood Fetish

    Member Blood Fetish Quis custodiet ipsos custodes?

    Joined:
    Nov 3, 2002
    Messages:
    980
    Country:
    United States
    I appreciate your enthusiasm on compression algorithms, but you cannot losslessly compress random data. The only way to achieve lossless compression is through pattern matching, which is in and of itself not random.

    edit: There is a reason why hashing is often referred to as "one-way encryption." It is not possible to derive with certainty the original message upon which the hash was based.
     
  4. SoraK05
    OP

    Member SoraK05 GBAtemp Regular

    Joined:
    Aug 24, 2007
    Messages:
    148
    Country:
    Kenya
    I've been trying something.

    Let's use a 1 byte hash and 1 byte magic number.
    Let's say:

    Hash = FF
    Magic number = FF (+ or - 256, it doesn't matter whether it is + or -, as long as it is 256).

    I've been trying to go through the 'sequence' of this combination by hand
    So far I got
    1. FF
    2. FF 00
    3. 01 00 FE

    I believe that is the 'sequence' so far..

    I kinda stopped there because it will take a while to figure out all the 3 byte matches.

    I may be wrong, but this match seems to be seldom..

    Unless I'm mistaken, '3' in the sequence == 01 00 FE

    3 bytes so far
    FF FF 03
    That would be hash FF magic FF and sequence 03

    I know this still represents 3 bytes, but it may take me a while to figure out how many matches there are for the 3 byte combinations and more that match the hash and sequence, to have the 'hash/magic match "sequence"'. So far it appears to be seldom.

    So far the magic number has seldom matches in proportion to the hash - at least up to 2 bytes.

    Perhaps when the sequence is FF it represents a file >= 4 bytes (at least I'm hoping), potentially breaking even.

    I'll try figure out how many matches there are with 3 bytes.
    I'm also taking into account that this works for files starting with 01 or more.
     

Share This Page