How do sound files (stuff like mp3s) work?

How do sound files (stuff like mp3s) work?

May 2, 2018
how does the computer now what sound to play and what it sounds like, how does the file make the computer generate virtually any sound imaginable?

» 3 Answers

  • FAST6191 Best answer
    May 3, 2018
    Two main approaches

    1) Wave replication.

    2) Sequenced/tracker. Classically this is things like midi, mod, s3m, IT and what many games will have used/had their audio known for. Indeed games are one of the main users of this style of audio production today.

    1) Your human ears only hear between about 20 and 20000 or 22000 Hz, it varies a bit with age and damage but let us not go there. Owing to a quirk of information storage you want double that (it is called Nyquist Shannon sampling theorem if you want to go into that) to store the data. This is why you see 44KHz in audio editing programs. If you are mastering audio then it can pay to go to 96KHz (mainly if you are speeding up and slowing down things), and there is another small quirk that is lessened by going higher sample rates but introducing you to the "loudness wars" is not going to be useful at this point.
    Anyway stick a decent microphone in front of your sound source and it will vibrate at 44 thousand times a second, proportionally to how loud the sound is for it as well. Measure this vibration and deflection and you have your sound captured. I will spare more on the specifics of microphone design right now as it gets hideously complicated. I will however say do make sure to look up turning a speaker into a microphone as it helps some people grasp some of the concepts involved.
    44 thousand times a second sounds like a lot to some, however consider that we have had computers that measure in the megahertz (millions of times a second) for decades now.

    2) In this your computer or device has a bunch of sounds prebaked into it. It then has a small file functioning as either sheet music or a piano roll (if you know what one of those is anyway).

    You can combine 1) and 2), the resulting library of instruments which is all a lot of takes of 1) that is played by 2) has many names between different formats and systems but "sound font" and "instrument bank" are two good starts.

    In some cases of electronics you also have 3) which is a noise generator but that is of limited use here.

    16 bits at 44000 times a second and multiplied by two channels (I assume you want independent left and right speakers) adds up fairly quickly (there is a reason a 650 to 700megabyte audio CD only holds a little over an hour of music) but as most audio is very similar one millisecond to the next you can start doing things to get this space down. You can go even further still and lose some clarity (hence the term lossy audio compression) but do it right and it is not like you will hear it.
  • Taffy
    May 2, 2018

    MP3 files are waveforms essentially. I don't know the specifics on codecs and stuff, but I know the physics!

    A speaker is composed of a voice coil and a diaphram. The voice coil is a coil of wire around a magn3t. When you apply a charge to the coil, it moves because magnetism.

    We can use this to make vibrations. By rapidly alternating the movwment in various ways, we get sound.

    MP3s are files a computer understands so it can make the speakers move. There are many other formats, all of varying quality.

    that's what I know.
  • romanaOne
    May 3, 2018
    Maybe too much detail, but you might like to contrast uncompressed (wav), lossy (mp3,ogg,aac), lossless (flac, alac) compression in the "wave replication" section. Uncompressed, you're going to get something like 10MB per minute of audio:

    44000 samples per sec * 16 bits per sample *60 sec per min * 2 channels /8 bits per byte /1024 bytes per KB

    = about 10 Megabytes per minute.

    So uncompressed, a crap 3-5 min. iTunes single would be 30 to 50 MB.
    Lossy compression at tolerable quality will get you a factor of 10 smaller: 1 MB per minute and that's nominally what you see in online music stores: 5MB or less for 5 mins.

    Tolerable is subjective and depends on your listening preferences: I find this 10X lossy compression (128 kilobits per sec) good 'nuff for listening in the car but not acceptible for listening a quiet room with good headphones. I gather the typical iTunes consumer probably never does this, judging by the shite quality of any commonly available headphones less than 50 USD.

    The best you can do with lossless is about 50 percent smaller. (Why?) So around 5MB per minute.

    Lossless compression is reversible: you can recover every single bit of the original file because nothing was thrown away. Lossy compression is obviously not reversible: duh!, you applied some complicated technique to THROW AWAY much of the original data so of course you can never get back the original.

    So not being an audiophile with bionic ears, why would you want to use a lossless codec?

    Long term storage...that good-sounding mp3 you got on Napster back in 1998, transcoded to ogg in a fit of FOSS zeal, then transcoded that to wma in a fit of cynicism, then converted to aac because you wanted to do everything with your shiny new iPhone is now getting pretty threadbare. Doh! It seems you lost the original mp3 sometime in the noughties, when your backup CDROMs (which the marketing blurbs said would last 100 years) got eaten by a fungus.

    Short answer: You would not want to save an audio file you are editing/transcoding with a lossy format because the quality would get worse with each edit and save.