Putting it all together¶
The goal of this chapter is to provide a fully annotated and functional
script for training a vocals separation model using
putting together everything that we’ve seen in this tutorial thus far. So that
this part runs in reasonable time, we’ll set up our model training code so that
it overfits to a small amount of data, and then show the output of the model
on that data. We’ll also give instructions on how to scale your experiment code up
so that it’s a full MUSDB separation experiment.
We’ll have to introduce a few concepts in
nussl that hasn’t been covered yet that
will make our lives easier. Alright, let’s get started!
%%capture !pip install scaper !pip install nussl !pip install git+https://github.com/source-separation/tutorial
Getting the data¶
The first concept we’ll want to be familiar with is that of data transforms.
a transforms API for audio, much like the one found in
torchvision does for image data.
Remember all that data code we built up in the previous section? Let’s get it back, this
time by just importing it from the
common module that comes with this tutorial:
%%capture from common import data, viz import nussl # Prepare MUSDB data.prepare_musdb('~/.nussl/tutorial/')
The next bit of code initializes a Scaper object with all the bells and whistles that were introduced in the last section, then wraps it in a nussl OnTheFly dataset. First, we should set our STFTParams to what we’ll be using throughout this notebook:
stft_params = nussl.STFTParams(window_length=512, hop_length=128, window_type='sqrt_hann') fg_path = "~/.nussl/tutorial/train" train_data = data.on_the_fly(stft_params, transform=None, fg_path=fg_path, num_mixtures=1000, coherent_prob=1.0)
Let’s take a look at a single item from the dataset:
item = train_data viz.show_sources(item['sources'])