Putting it all together¶
The goal of this chapter is to provide a fully annotated and functional
script for training a vocals separation model using nussl
and Scaper
,
putting together everything that we’ve seen in this tutorial thus far. So that
this part runs in reasonable time, we’ll set up our model training code so that
it overfits to a small amount of data, and then show the output of the model
on that data. We’ll also give instructions on how to scale your experiment code up
so that it’s a full MUSDB separation experiment.
We’ll have to introduce a few concepts in nussl
that hasn’t been covered yet that
will make our lives easier. Alright, let’s get started!
%%capture
!pip install scaper
!pip install nussl
!pip install git+https://github.com/source-separation/tutorial
Getting the data¶
The first concept we’ll want to be familiar with is that of data transforms. nussl
provides
a transforms API for audio, much like the one found in torchvision
does for image data.
Remember all that data code we built up in the previous section? Let’s get it back, this
time by just importing it from the common
module that comes with this tutorial:
%%capture
from common import data, viz
import nussl
# Prepare MUSDB
data.prepare_musdb('~/.nussl/tutorial/')
The next bit of code initializes a Scaper object with all the bells and whistles that were introduced in the last section, then wraps it in a nussl OnTheFly dataset. First, we should set our STFTParams to what we’ll be using throughout this notebook:
stft_params = nussl.STFTParams(window_length=512, hop_length=128, window_type='sqrt_hann')
fg_path = "~/.nussl/tutorial/train"
train_data = data.on_the_fly(stft_params, transform=None, fg_path=fg_path, num_mixtures=1000, coherent_prob=1.0)
Let’s take a look at a single item from the dataset:
item = train_data[0]
viz.show_sources(item['sources'])
