Media Servers

Multimedia Codecs, Containers, Filenames, and Metadata

This article is an introduction to digital multimedia files and their file extensions, and explains the relationship between multimedia file types (file extensions), containers, codecs, and metadata as they pertain to multimedia files.

These articles go into more depth, and may also be of interest:

Wikipedia defines multimedia as,

"...text, audio, images, animations, video and interactive content."

When viewing or listening to digital multimedia content, how does your device know how to play the content? How do you know the content you are viewing or listening to is a faithful reproduction of the artist's original work? The answer lies in standardization. A common set of principles has been applied from creation to reproduction, such that you are able to consume art as intended by its author. This article is part of a series explaining the core concepts around how digital multimedia is shared.

Keys to Compatibility: Encoding and Decoding

If you think about the birth of the film industry, in the beginning, no standards existed. The film format of early moving picture films was arbitrary. Over time, standards were developed as the concept proved popular. Digital media - including films and music - is no different.

Digital media - music, video, photographs, and images - must be stored in an organized fashion, according to a schema or design. This storage process is called encoding. Raw data representing the artwork is embedded inside a common framework, governed by an algorithm. This process allows for the storage and transmission of the art (a song, picture, video, etc.) in such a way that it may be faithfully reproduced later by another, independent process.

Decoding is the reverse of encoding. By adhering to the same common framework and algorithm as the encoding process, a decoding process reverses the storage process of the artwork and restores the art to its pure content form for presentation.

By applying a common framework to how digital media is recorded, it may be stored and transferred at will via digital files, with confidence others will be capable of reproducing the original art.

Digital Multimedia Files

How does this business of "encoding" and "decoding" files pertain to filenames? Well, in reality there isn't a direct correlation between the name of a file and how its content was encoded. However, there is an informal, but strong correlation between file types, filename extensions, and file encoding.

File Extensions: An Overly Simplistic Benchmark

Contrary to popular belief, file extensions do not determine what a file's content is (though they should indicate it). Over time, multimedia filename standards have evolved due to the popularity of various encoding formats and a common desire to readily identify how files are encoded without the need to probe every possible decoder solution against every file at read or execution time.

Can you imagine the process of opening any file on your computer if you had no idea what kind of file it was? You would need to attempt opening it with every application on your device until you were successful. Even then, it might not be the correct app. Your test app might open it anyway, and the result could appear as gibberish to you. This problem was the catalyst for the adoption of common file extensions. Many file extensions are codified in Request For Comments (RFC) documents published by the Internet Engineering Task Force (IETF), an authoritative international community that establishes most standards applicable to the Internet. Some common use cases have developed informally over time due to crowd-sourced habits and customs, such as the ubiquity of particular codecs, containers, or combinations of them.

File extensions are shorthand representations of associated containers (and sometimes, codecs).

Regardless of how the current crop of common multimedia file extensions came to be, these common practices reduce the effort required by users to properly decode a given media file.

Deciphering Multimedia File Formats

There are four (4) key architectural components to multimedia files:

  1. Filename Extensions
  2. Containers
  3. Codecs
  4. Metadata

Media Filename Extensions

Digital media filename extensions should be indicative of a media file's container type.

Most media files are strictly containers, but a minority of codecs are also containers. While some cross-overs are expected (e.g. .mp3/MP3), there are times when someone creates an unconventional pairing between file extension, codec, and container type. Such unexpected mis-matches occur when the file extension, container, and codec are not in sync. When this happens, it is nearly always due to an error on the part of the person who created the file, though there are a few circumstances where that's not the case, but the container/codec combination is odd.

Unexpected Combinations

File extensions are arbitrary, in the sense anyone can change them and/or use any file extension they like. Of course, there are common file extensions, which are split into two (2) broad categories: published file extension standards decreed by an accredited standards organization; or a de-facto standard borne out of widespread, customary but unofficial use. AAC (Advanced Audio Coding) or .aac is an example of the former. MP3 or .mp3 is an example of the latter.

Standardized file type associations drive consistency among users and applications, and attempt to prevent confusing end users via unconventional combinations of codecs and containers.

MP3: When a Codec is a Container

Some codecs cross the line between codec and container, and may act as both. MP3 files are perhaps the most well known example of this phenomenon. Technically, MP3 is a codec, however, MP3 is capable of also acting as a container. This is because MP3 decoders are capable of identifying a file is an MP3 file, provided certain characteristics exist in the file. When an MP3 encoder stores data in a file with the .mp3 filename extension (the most common file extension for MP3 files), this preamble in the file exists.

Another factor is MP3 files are capable of storing ID3 metadata embedded in the file along with the audio data. While most codecs contain some metadata, it typically only relates to fundamental technical information about the audio content, using what is referred to as skeleton metadata; a minimal set of information. ID3, however, is descriptive metadata, meaning it describes the audio content contained in the file. Typically, this sort of metadata information would be found at the container level (e.g Matroska files), but since ID3 metadata can be included in raw MP3 files, it's not necessary for MP3s to be placed inside containers in order to afford contextual information about the content.

Why Multimedia Containers Are Important

Normalizing file extension associations tends to shift the onus of responsibility of codec identification to the container level. Let's look at an MPEG-4 container as an example of when a container needs to clearly identify its content type in its preamble so that a decoding application will properly identify the content inside the container.

Often referred to as ".mp4" files (their most common file extension), MP4 (MPEG-4) files (multimedia containers) are expected to contain MPEG-4 video, and possibly MPEG-4 audio as well. The standard MPEG-4 audio type is the AAC codec). Likewise, .m4a is an audio-only file MPEG-4 container, and therefore one would expect an M4A file to contain AAC encoded audio.

Now, here is the clincher. It is conceivable an .m4a or .mp4 file has been encoded with the MP3 codec. While not common practice, it is technically possible. The good news is the AAC decoders are backward compatible with MP3 (the AAC standard replaced MP3), however the decoder still needs to know what type of content it needs to decode.

Thankfully, this process is entirely transparent to and end user, but the point here is a content provider and/or encoder must be cognizant of what type of content they are encoding and whether or not their intended container format fully supports their multimedia data. For this reason, most containers provide skeleton metadata that clearly indicates the audio and video codecs used to encode their underlying content.

Containers

Breakdown of typical media file structure (high level)

Containers are simply vessels or holders of multimedia content (audio and/or video). Containers hold audio and/or video data produced according to a particular codec. Containers store basic configuration information that describes which codec they are holding, and the content itself. Some containers also store metadata.

Codecs

Codecs are algorithms that encode and decode multimedia data. Audio codecs are simply codecs designed to encode and decode audio data such as music, speech, or soundtracks that accompany video/movies. Most of the time, codecs are identified within a container. Thus, the container provides guidance to a reading application with regards to which codec should be used to decode the content inside the container. However, that is not always the case.

Some codecs can be applied in the encoding phase in such a way that a file is created without being paired with a container. MP3 is an example of this. MP3 files do not require inclusion inside a container, but they may be packaged inside a container. MP3 file are unusual in another way as well; they allow the inclusion of metadata with the encoded file. In this sense, MP3 files may act as both a codec and a container simultaneously. A specific metadata schema called ID3 may be embedded within the MP3 file itself. Functionally, these traits cause the MP3 file to be treated as a container. In reality, MP3 files are a sort of codec/container hybrid. This often generates confusion when MP3 encoded content is bundled inside real containers (such as .mp4 files). This can result in nested metadata layers, which may confuse some media players.

Metadata

Sometimes described as "data about data," Metadata is contextual information that describes the characteristics of digital content associated with the metadata.

To learn more about multimedia metadata, see this article:How to Structure a Music Library

Metadata follows a defined, consistent format, remaining constant between objects. For example, metadata could consist of a work's title, author, artist name, or a combination of them. One of the most important factors of metadata is consistency. Most metadata formats resemble XML (EXtensible Markup Language) in UTF-8 text format.

Metadata must follow a schema: a construction guide that describes how information will be represented, recorded, and retrieved. A schema allows one process to record metadata and another, independent process to retrieve the same data at a later time. Provided both recording and reading processes share the same schema, information may be transferred between them.

What is Skeleton Metadata?

Skeleton metadata is a mimimal set of metadata that is just sufficient to allow decoding applications to properly identify basic characteristics of associated content, such as start and end byte positions in a file.