Ultra File Compression

Discussion in 'General Off-Topic Chat' started by JFTS, Jun 8, 2013.

  1. JFTS
    OP

    JFTS GBAtemp Regular

    Member
    166
    7
    May 10, 2011
    Fiji
    So, does anyone know an effective method to ultra-compress files? You might have seen some PC games compressed to incredibly small sizes here and there (like GTA San Andreas which was compressed to 1mb file!). I tried many high-compression programs like 7zip, UHARC, KGB Archiver, but I can't get the same results on other types of files, like a music album for example. Why some files can't be compressed? I don't get why games can be reduced to 70% of their size, but a 60MB album, or a 5MB PDF are pretty much the same after compression.
     
  2. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,361
    9,154
    Nov 21, 2005
    If San Andreas came as a 1 meg file it is because the 1 meg was either
    1) Fake
    2) A virus
    3) A downloader for the rest of the game/frontend for a service like onlive.
    4) Possibly a combination of all three previous options.

    There are ultra small games that do things like procedural generation ( http://pouet.net/prod.php?which=12036 being one of the more noted examples of it happening in the modern world) but GTA is not one of them nor one able to be reduced to it.

    Compression is based on repetition of patterns in files and many files (multimedia ( http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/mpeg1/ )and things like PDF being great examples) are compressed as part of how they are made. There are some area specific compressions and there are some half compression/half file recovery methods more suited to use on a supercomputer but let us not get into that right now.
     
    DinohScene and Satangel like this.
  3. JFTS
    OP

    JFTS GBAtemp Regular

    Member
    166
    7
    May 10, 2011
    Fiji
  4. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,361
    9,154
    Nov 21, 2005
    Command & Conquer: Renegade is a 2002 textured 3d game -- it would not have used much compression on anything and thus is a prime candidate for this. The link you gave does not compare it to other methods but I reckon if done right you would see similar savings from a basic zip file -- it would be a distraction from the point but if you just tried to compress the directory it would likely not work very well as basic zip works on a file by file basis where things like 7zip take all the files and call it a single data pool to compress.

    PDF
    The PDF is halfway between javascript, html, a word document and lisp (it is a horrible standard as far as such things go... which is half the reason PDF readers are one of the tree main avenues for malware along with flash and java). However being largely composed of text and markup and being billed as a portable format the programs that make it tend to compress it when they make them for basic text and markup compress quite well. As compression eliminates repetition it will

    Music and video... I should have mentioned this first time around. It is called lossy compression and as the name implies it is compression that loses data.
    If your document does not decompress exactly as it went in then bad things happen (your contract for 10 million now reads 10 and such like) -- for this you need lossless compression.
    If your video happens to not have as deep a black for a few frames as it might originally have had, the somewhere where the actor is not happens to be a bit blurry or it jerks a tiny bit between frames it is no big deal. It gets very very complex (I linked MPEG1 already -- that is some 20 years old at this point, now consider what has happened in the last 20 years of computing and how multimedia still pushes computers to the limit) and draws on everything from high end physics to psychology but those sorts of things are what is done.
    Music is kind of similar (in fact very similar if you take it down to the lowest levels and underlying maths) in that people are not good at telling closely related pitches apart (though recent studies have shown it is better than some assumed... if you are a trained musician) so at points they will leave things the same that nobody should be able to tell apart and gain a space saving.
     
  5. JFTS
    OP

    JFTS GBAtemp Regular

    Member
    166
    7
    May 10, 2011
    Fiji
    Sorry for not responding sooner.

    Thanks for your explanation! The only thing I don't get is the "random" rate of the compression of the files. I have a 23MB folder containing many 10-20 sec mp3s and a PDF about 1,5MB. When I compressed this folder with 7-zip, it went from 23 to 5MB! How can it be compressed so much, but other files/folders can't?
     
  6. FAST6191

    FAST6191 Techromancer

    pip Reporter
    23,361
    9,154
    Nov 21, 2005
    The MP3 thing is odd unless there is a lot of silence.

    Generally when you compare something like 7zip to something else you have to consider that 7zip treats each file as one continuous/contiguous data set where basic zip and some implementations for rar do not. If you have a lot of files of the same type (again MP3 is a bit odd) then it may see similarities between the files where lesser methods might not. It is also why if you try to extract just one file from the 7zip archive it will take ages.

    That might be a bit overly simplistic as there is also the dictionary size to deal with (short version is it only considers so much of the file before it stops looking) but for small files that will not come into play.
     
  7. JFTS
    OP

    JFTS GBAtemp Regular

    Member
    166
    7
    May 10, 2011
    Fiji
    Thanks for your advice! I guess it's kinda "trial-and-error" method. Some files can be ultra compressed and some cannot.
     
  8. Xuphor

    Xuphor I have lied to all of you. I am deeply sorry.

    Banned
    1,681
    957
    Jul 14, 2007
    United States
    USA
    Two methods come to my mind for extreme file compression:

    1 - Bethesda's compression software. The managed to fit the entirety of Skyrim into just a little over 5 GB on the PC version. That's insane. It's also not available to the public without some very fancy (and illegal) third-party software.

    2 - UnARC, some sort of program I've discovered from uh.... less than legal places (Torrent websites). It short, it managed to once get a 7GB game down to ~2GB somehow, without loosing/ripping anything. However, the program itself is NOT illegal, so here is a link: http://membled.com/work/apps/unarc/
    I do not know how to use UnARC myself, but I've seen the wonders it can do. Beware, however, if you do use the most extreme compression method in UnARC (like the 7GB down to 2GB thing), it took my computer a good hour and a half to extract it, using an AMD Phenom II X4 965 processor. While it's not the bets CPU, it's certainly not terrible, so decompression times are very long. I can't imagine how long it would take to compress something using that.
     
  9. DinohScene

    DinohScene Capture the Dino

    Member
    GBAtemp Patron
    DinohScene is a Patron of GBAtemp and is helping us stay independent!

    Our Patreon
    15,836
    12,291
    Oct 11, 2011
    Antarctica
    В небо
    Why compress files as the technology these days allows for large harddrives and large flashdrives ._.
    Not only that but the internet gets faster daily.

    Besides, you really need 6+ TB of data?
     
  10. JFTS
    OP

    JFTS GBAtemp Regular

    Member
    166
    7
    May 10, 2011
    Fiji
    I think that for big games or professional programs, all companies must have "internal" compressors/extractors in order to fit them on discs.

    Thanks for the reference about UnARC. I'll try it later and see if the results are any different.


    It's not so much about compression itself, but when you archive raw uncompressed files of any kind, you always have to think about the space they need. Furthermore, when working with professional audio/video programs, 6+TB are not so uncommon for HDDs.