Music

Kodi Music Scanning and Scraping Process (v18)

Kodi is a popular open-source multimedia player that runs on a variety of operating systems including Windows, Mac (OSX), Android, iOS, Linux, and even Raspberry Pi.

This article series is a technical discussion.

This article begins a series examining the Kodi multi-media player's handling of music files. Specifically, its process flow logic. There's plenty of information elsewhere describing how to use Kodi to manage your music library and serve content, such as Kodi's Wiki. What is missing elsewhere is a deep-dive into the process flow of Kodi's music library management system. Do you need or want to know the order in which Kodi makes various decisions about music files? If yes, then you may find this series to be helpful.

This article series uses diagrams of the relevant of Kodi's software architecture to explain Kodi's music process flow logic. The articles pertain to the Linux version of Kodi exclusively. During the research phase of writing these articles, I reverse-engineered corresponding Linux C++ modules (include files), tracing each additional include file as I moved along. At various points, the code's behavior is explained in great detail and at other points the processes are described at a higher level. I have tried to keep the technical details light to focus on the process flow and order-of-operations. The focus is Kodi's behavior while scanning and scraping music files (tracks/songs).

This article series applies to Kodi v18 (Leia), however most described functions apply to v17 (Krypton) as well.

Article Contents

This article introduces the article series. A high-level process overview diagram is presented at the bottom of this article. It contains the following sections:

Kodi's Music Management is Confusing

One of the most challenging learning curves with Kodi is understanding how it sorts and categorizes music files. From a user-experience viewpoint, what seems logical to one person may not be to another. This is underscored by the myriad of music genres, file types, and metadata fields. Any way that you look at it, the subject of categorizing music is much more complicated than any other digital media type. Therefore, the complexity and steep learning curve is not a design issue with Kodi; rather, it is the nature of the beast, so to speak. The alternative would be a draconian implementation forcing music content into predetermined silos. If you want that, you're probably not interested in Kodi to begin with (or open source solutions for that matter).

Over time, the Kodi project has evolved to become more flexible in organizing the media it manages as it has struggled to meet the demands and needs of its users. Unfortunately, in spite of these efforts, its processing of music files such as cataloging and sorting remains an enigma to most users. Kodi's music organization algorithms are significantly more complex compared to its handling of videos, photos, and other media types.

Scanning vs. Scraping

Scanning music files (and any media files for that matter) is a necessary portion of Kodi's media management technique. The scanning phase is how Kodi learns which files are media files, where they are located, and what they appear to be (music, photo, movie, TV show, etc.). You must first inform Kodi where it should expect to find particular file types (such as music). The Scraping process is independent functionally, but it must be requested (per user settings). The scraping process functions are mingled with the scanning process. Users (and even developers) are often confused regarding which process is in control when. This article series' process diagrams seek to clarify this issue.

Kodi's music scanning and scraping processes are intertwined. When new files are detected or Kodi is manually told there are new music files it needs to scan into its library, it begins with a file scanning process. During that phase, metadata embedded in the files (if any) is collected. The scraping process (if enabled) follows-up and attempts to fill-in any missing metadata. Depending on its settings established by the user, Kodi may overwrite metadata gleaned from the file with metadata derived from online sources, such as MusicBrainz. These actions and whether or not they happen depend on Kodi's music ettings, as established by the end user.

MusicBrainz

Kodi's music management system relies heavily on tagging. Not just standard tags such as the metadata included in MP3 files for example, but specifically custom tags created by an open-source music tagging system called MusicBrainz Picard (or simply MusicBrainz). More than tagging software, MusicBrainz combines a database of known artist, album, and track (song) information along with text-based names and even digital fingerprinting of songs. It combines these features into a single engine that is supposed to make sorting your music collection easy.

MusicBrainz creates its own proprietary metadata tags. It embeds both in your music files.

When applied, it is incorporated during the online scraping phase. If your music library has been pre-screened and tagged by MusicBrainz, you will find Kodi's integration to be almost effortless. In my case, virtually all of my music files have been carefully, manually tagged. This creates the opportunity for a disconnect between how I think about music tagging and how Kodi thinks about music tagging.

Love It or Leave It

MusicBrainz is optional. I seem to be an odd-man out. I don't dislike MusicBrainz, but I'm less-than enamored with it. I applaud the concept, but the reality is there is no fool-proof solution for automatic music tagging. While it is a noble effort, it has a significantly high ratio of false positives for my taste (based on my experimentation with it). I am certain this is due to my tastes in music. MusicBrainz has several methods of identifying a particular track. However, the bottom line is it relies upon a combination of crowd sourcing, audio fingerprinting, and a database of known/crowd source approved metadata, metrics, and audio fingerprints. I've written about the weaknesses of the artificial intelligence systems such as MusicBrainz' audio fingerprinting in my blog, so I shall skip detouring to that discussion here. The bottom line is I know there are many MusicBrainz fan-boys out there, but I'm not one of them. I do appreciate the effort, but my particular music collection proved frustrating when I applied MusicBrainz to it. The process yielded hundreds of mis-identified tracks that I had to fix manually.

MusicBrainz too has a learning curve and there were no doubt mistakes made in my efforts. No matter. It underscores the impetus behind my entire efforts in the research I put forth that led to writing this series of articles - the fact that I struggled so much to automate what a human can do rather easily - indicates that before using these tools (such as Kodi's music file cataloging system and/or MusicBrainz), one should be certain it's the right tool for the job.

Why?

Everyone asks me this question. I began this project while attempting to understand how Kodi v17 (Krypton) cataloged music files. Its management and cataloging of music files didn't seem to work as well as it should have. Some files were duplicated, some were not read and added to the library. I couldn't figure out why it behaved the way it did under certain circumstances. It didn't handle some files as I expected. I also found Kodi's distribution of option switches to be confusing (where various switches were found and what they did).

As I was in the midst of fiddling with Kodi, Leia (v18) was released - after finally molding Krypton into submission - I decided to archive my work and start anew with v18 to determine if it was easier to work with. When I experienced similar challenges, I chose to dive in and figure out what was causing the disconnect between my mind's vision of how it should work and how Kodi actually did work.

The 10,000 Foot View

At a very high level, this is how the process looks when Kodi scans for new music files. The process may be kicked off manually, or depending on your settings it may occur at other times automatically, such as when the Kodi process starts up.

10,000 foot view of Kodi v18 Music Scanning and Scraping Process

You can see above there are circles with numbers spread about the diagram. These more-or-less represent the order of execution. Many steps are not sequential, as you can see from following some of the connecting lines, but at a high level the numbers provide a relative order. The subsequent articles in this series are arranged via this numeric order.

Kodi's music scanning module, MusicInfoScanner.cpp
is found at https://github.com/xbmc/xbmc/blob/Leia/xbmc/music/infoscanner/MusicInfoScanner.cpp

The 5,000 Foot View

This diagram is still very high level, but explains where/when certain activities occur that you may be familiar with. For example, when are .NFO files processed relative to checking for MusicBrainz tags? The picture below provides a high level perspective of when those steps occur relative to one another.

5,000 foot view of Kodi v18 Music Scanning and Scraping Process

The next article is Phase 1 (Scanning & Sorting by Album)