Another attempt at file compression :S

SoraK05

Well-Known Member
OP
Member
Joined
Aug 24, 2007
Messages
155
Trophies
0
XP
286
Country
Kenya
An idea came to me about a file having what I would call a 'magic number'.

Let's take these 5 bytes:
10 30 54 62 81

To calculate the magic number, you add the first byte, subtract the second, add the third, subtract the fourth, and so on.
It keeps adding/subtracting until it gets to the last byte, and the result is the magic number.

+10 - 30 +54 -62 +81
=53
53 is the magic number for those 5 bytes.

I believe no combination of those 5 bytes except that one will produce that magic number.
[EDIT was wrong about that.. :)]

To take this idea further, perhaps the amount of each byte can be recorded:
i.e. 10 00s, 12 01s, 5 02s, 4 03s could look like this:

10 12 05 04

These 4 bytes represent the 31 bytes here.

The idea is to shuffle those known bytes, calculating each combination, until it reaches the magic number.

I may be wrong, but I believe the likelihood of having the magic number repeating in those combinations is low.

On top of that, a hash can be saved of the file, so when a magic number is matching, it verifies the contents with the hash.


I presume the likelihood of getting a matching magic number is generally low, and having a string that matches the magic number and hash is even lower.

It may take a really long time though :S



EDIT
Just realized that there are at least 11 other matching magic numbers to those 5 bytes :S
10 - 30 + 81 - 62 + 54 = 53
10 - 62 + 54 - 30 + 81 = 53
10 - 62 + 81 - 30 + 54 = 53
54 - 30 + 10 - 62 + 81 = 53
54 - 30 + 81 - 62 + 10 = 53
54 - 62 + 10 - 30 + 81 = 53
54 - 62 + 81 - 30 + 10 = 53
81 - 30 + 10 - 62 + 54 = 53
81 - 30 + 54 - 62 + 10 = 53
81 - 62 + 10 - 30 + 54 = 53
81 - 62 + 54 - 30 + 10 = 53

It may not be as few as I thought.. :S

Trying to confirm if the number of matching magic number and hashes at the same time is low enough to record as its own 'combination value' to make it worth it as a file smaller than the original one..


EDIT
A friend of mine gave a mathematical calculation to show that every 81 bytes + has more than one matching magic number and SHA1 at the same time, and the amount of matches grows exponentially the more bytes after that.
So, it seems the magic number + SHA1 hash combination (i.e. 20 bytes) may or may not have a collision for every 80 bytes.
 
  • Like
Reactions: 1 person

FAST6191

Techromancer
Editorial Team
Joined
Nov 21, 2005
Messages
35,956
Trophies
3
Website
trastindustries.com
XP
26,550
Country
United Kingdom
What you described is a minor tweak on bytesum aka second least advanced method of making a hash/checksum (the easiest being parity).

It is slightly better than your last idea of using a cryptographic grade hash (which by definition should not be able to be used for searching and data reconstruction purposes owing to the cascade effect beyond actually checking you got it right which as was mentioned in your previous thread is still not ideal) and brute force but you still seem to be heading towards/slowly reinventing the idea of data recovery parity (not to be confused with the basic concept of parity) as seen in the likes of the higher levels of RAID, PAR2 files and to a lesser extent in ECC ram.
Here each part of the file would be sampled more than necessary (oversampled) to recreate the file (you can define a straight line with two points but get four points and if one is wrong you still have the line) usually via something like a Fourier transform and then you can stick your processing power on recreating the file.

I am pretty much echoing the last thread though so I will stop there.
 

Blood Fetish

Quis custodiet ipsos custodes?
Member
Joined
Nov 3, 2002
Messages
1,101
Trophies
2
Age
43
Website
Visit site
XP
1,157
Country
United States
I appreciate your enthusiasm on compression algorithms, but you cannot losslessly compress random data. The only way to achieve lossless compression is through pattern matching, which is in and of itself not random.

edit: There is a reason why hashing is often referred to as "one-way encryption." It is not possible to derive with certainty the original message upon which the hash was based.
 

SoraK05

Well-Known Member
OP
Member
Joined
Aug 24, 2007
Messages
155
Trophies
0
XP
286
Country
Kenya
I've been trying something.

Let's use a 1 byte hash and 1 byte magic number.
Let's say:

Hash = FF
Magic number = FF (+ or - 256, it doesn't matter whether it is + or -, as long as it is 256).

I've been trying to go through the 'sequence' of this combination by hand
So far I got
1. FF
2. FF 00
3. 01 00 FE

I believe that is the 'sequence' so far..

I kinda stopped there because it will take a while to figure out all the 3 byte matches.

I may be wrong, but this match seems to be seldom..

Unless I'm mistaken, '3' in the sequence == 01 00 FE

3 bytes so far
FF FF 03
That would be hash FF magic FF and sequence 03

I know this still represents 3 bytes, but it may take me a while to figure out how many matches there are for the 3 byte combinations and more that match the hash and sequence, to have the 'hash/magic match "sequence"'. So far it appears to be seldom.

So far the magic number has seldom matches in proportion to the hash - at least up to 2 bytes.

Perhaps when the sequence is FF it represents a file >= 4 bytes (at least I'm hoping), potentially breaking even.

I'll try figure out how many matches there are with 3 bytes.
I'm also taking into account that this works for files starting with 01 or more.
 

You may also like...

General chit-chat
Help Users
  • Psionic Roshambo @ Psionic Roshambo:
    Hulk turns back into banner and Shrek is disappointed
  • K3N1 @ K3N1:
    Wasn't that the entire flick of the movie about the girl turning into an oger I'm sure Shrek 6 will gay shock the world
  • K3N1 @ K3N1:
    Omg he's brown and gay and headlines!
  • Psionic Roshambo @ Psionic Roshambo:
    Shrek finds a new swamp to wallow in.
  • AncientBoi @ AncientBoi:
    Brown? Omg, I think I just flushed him down the tiolet
  • K3N1 @ K3N1:
    Shrek's child comes out of the closet Shrek must learn how to not be cancelled
  • Wish13 @ Wish13:
    wth have I stumbled into
  • Wish13 @ Wish13:
    Oh whatever, I'm not even gonna try to comprehend what this is
  • K3N1 @ K3N1:
    It's a chatbox congrats
  • Wish13 @ Wish13:
    Well, I mean what was happening up there
  • x65943 @ x65943:
    shrek jr making tiktoks about xyr bigoted neutral gender parent
  • K3N1 @ K3N1:
    1mill subs overnight
  • x65943 @ x65943:
    sub is a heteronormative term used as a pejorative upon their nonheteronormative brethren. It shall be replaced by power bottom
    +1
  • K3N1 @ K3N1:
    I thought it meant subsandwhich before the internet took over
  • Psionic Roshambo @ Psionic Roshambo:
    Power Bottom sounds like a weight lifter hooker
  • K3N1 @ K3N1:
    Press my power button bby
  • Psionic Roshambo @ Psionic Roshambo:
    Go go power bottoms mighty furry power bottoms!!!!
  • Psionic Roshambo @ Psionic Roshambo:
    It's furry time!!!
  • Psionic Roshambo @ Psionic Roshambo:
    Mega whore power activated!!!!
  • K3N1 @ K3N1:
    Don't those hang out in the politic section
  • Psionic Roshambo @ Psionic Roshambo:
    Go go prostitute power, use crab lice attack, use herpes outbreak!!!
    Psionic Roshambo @ Psionic Roshambo: https://youtu.be/Vdfqo4EZjVM