An idea came to me about a file having what I would call a 'magic number'.
Let's take these 5 bytes:
10 30 54 62 81
To calculate the magic number, you add the first byte, subtract the second, add the third, subtract the fourth, and so on.
It keeps adding/subtracting until it gets to the last byte, and the result is the magic number.
+10 - 30 +54 -62 +81
=53
53 is the magic number for those 5 bytes.
I believe no combination of those 5 bytes except that one will produce that magic number.
[EDIT was wrong about that.. ]
To take this idea further, perhaps the amount of each byte can be recorded:
i.e. 10 00s, 12 01s, 5 02s, 4 03s could look like this:
10 12 05 04
These 4 bytes represent the 31 bytes here.
The idea is to shuffle those known bytes, calculating each combination, until it reaches the magic number.
I may be wrong, but I believe the likelihood of having the magic number repeating in those combinations is low.
On top of that, a hash can be saved of the file, so when a magic number is matching, it verifies the contents with the hash.
I presume the likelihood of getting a matching magic number is generally low, and having a string that matches the magic number and hash is even lower.
It may take a really long time though :S
EDIT
Just realized that there are at least 11 other matching magic numbers to those 5 bytes :S
10 - 30 + 81 - 62 + 54 = 53
10 - 62 + 54 - 30 + 81 = 53
10 - 62 + 81 - 30 + 54 = 53
54 - 30 + 10 - 62 + 81 = 53
54 - 30 + 81 - 62 + 10 = 53
54 - 62 + 10 - 30 + 81 = 53
54 - 62 + 81 - 30 + 10 = 53
81 - 30 + 10 - 62 + 54 = 53
81 - 30 + 54 - 62 + 10 = 53
81 - 62 + 10 - 30 + 54 = 53
81 - 62 + 54 - 30 + 10 = 53
It may not be as few as I thought.. :S
Trying to confirm if the number of matching magic number and hashes at the same time is low enough to record as its own 'combination value' to make it worth it as a file smaller than the original one..
EDIT
A friend of mine gave a mathematical calculation to show that every 81 bytes + has more than one matching magic number and SHA1 at the same time, and the amount of matches grows exponentially the more bytes after that.
So, it seems the magic number + SHA1 hash combination (i.e. 20 bytes) may or may not have a collision for every 80 bytes.
Let's take these 5 bytes:
10 30 54 62 81
To calculate the magic number, you add the first byte, subtract the second, add the third, subtract the fourth, and so on.
It keeps adding/subtracting until it gets to the last byte, and the result is the magic number.
+10 - 30 +54 -62 +81
=53
53 is the magic number for those 5 bytes.
I believe no combination of those 5 bytes except that one will produce that magic number.
[EDIT was wrong about that.. ]
To take this idea further, perhaps the amount of each byte can be recorded:
i.e. 10 00s, 12 01s, 5 02s, 4 03s could look like this:
10 12 05 04
These 4 bytes represent the 31 bytes here.
The idea is to shuffle those known bytes, calculating each combination, until it reaches the magic number.
I may be wrong, but I believe the likelihood of having the magic number repeating in those combinations is low.
On top of that, a hash can be saved of the file, so when a magic number is matching, it verifies the contents with the hash.
I presume the likelihood of getting a matching magic number is generally low, and having a string that matches the magic number and hash is even lower.
It may take a really long time though :S
EDIT
Just realized that there are at least 11 other matching magic numbers to those 5 bytes :S
10 - 30 + 81 - 62 + 54 = 53
10 - 62 + 54 - 30 + 81 = 53
10 - 62 + 81 - 30 + 54 = 53
54 - 30 + 10 - 62 + 81 = 53
54 - 30 + 81 - 62 + 10 = 53
54 - 62 + 10 - 30 + 81 = 53
54 - 62 + 81 - 30 + 10 = 53
81 - 30 + 10 - 62 + 54 = 53
81 - 30 + 54 - 62 + 10 = 53
81 - 62 + 10 - 30 + 54 = 53
81 - 62 + 54 - 30 + 10 = 53
It may not be as few as I thought.. :S
Trying to confirm if the number of matching magic number and hashes at the same time is low enough to record as its own 'combination value' to make it worth it as a file smaller than the original one..
EDIT
A friend of mine gave a mathematical calculation to show that every 81 bytes + has more than one matching magic number and SHA1 at the same time, and the amount of matches grows exponentially the more bytes after that.
So, it seems the magic number + SHA1 hash combination (i.e. 20 bytes) may or may not have a collision for every 80 bytes.