The MUSDB18 dataset¶
Overview¶
The information on this page is based on the MUSB18 dataset page. Here we have edited down the content to focus on the details relevant to this tutorial while keeping it concise. For more details about the datataset pleasecan consult the dataset page.
MUSDB18 is a dataset of 150 full length music tracks (~10h total duration) of varying genres. For each track it provides a mixture along with the isolated stems for the drums, bass, vocals, and others. As its name suggests, the “others” stem contains all other sources in the mix that are not the drums, bass or vocals (labeled as “accompaniment” in the diagram below):
Image source: https://sigsep.github.io/
All audio signals in the dataset are stereo and encoded at a sampling rate of 44.1 kHz. The mixture signal is identical to the sum of the stems.
The data in MUSDB18 is compiled from multiple sources: the DSD100 dataset, the MedleyDB dataset, the Native Instruments stems pack, and the The Easton Ellises - heise stems remix competition.
Note
MUSDB18 can be used academic purposes only, with multiple of its tracks licensed under a Creative Commons Non-Commercial Share Alike license (BY-NC-SA).
The full dataset is divided into train and test folders with 100 and 50 songs respectively. As their names suggest, the former should be used for model training and the latter for model evaluation.
Note
You do not need to download the full MUSDB18 dataset to complete this tutorial. For simplicity, we’ll be using short excerpts (clips) from this dataset which
we will download via the nussl
python library in the next step.
Downloading and inspecting MUSDB18 clips¶
We’ll use nussl
, the source separation library used in this tutorial, to download MUSDB18 clips. Recall nussl
was briefly introduced HERE, and we’ll dive into it
in greater detail HERE. For now, we’ll just use it to download the data:
Acknowledgement¶
MUSDB18 was created by: Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. When using the dataset in your work, please be sure to cite it as:
@misc{musdb18,
author = {Rafii, Zafar and
Liutkus, Antoine and
Fabian-Robert St{\"o}ter and
Mimilakis, Stylianos Ioannis and
Bittner, Rachel},
title = {The {MUSDB18} corpus for music separation},
month = dec,
year = 2017,
doi = {10.5281/zenodo.1117372},
url = {https://doi.org/10.5281/zenodo.1117372}
}