Putting it all together

The goal of this chapter is to provide a fully annotated and functional script for training a vocals separation model using nussl and Scaper, putting together everything that we’ve seen in this tutorial thus far. So that this part runs in reasonable time, we’ll set up our model training code so that it overfits to a small amount of data, and then show the output of the model on that data. We’ll also give instructions on how to scale your experiment code up so that it’s a full MUSDB separation experiment.

We’ll have to introduce a few concepts in nussl that hasn’t been covered yet that will make our lives easier. Alright, let’s get started!

!pip install scaper
!pip install nussl
!pip install git+https://github.com/source-separation/tutorial

Getting the data

The first concept we’ll want to be familiar with is that of data transforms. nussl provides a transforms API for audio, much like the one found in torchvision does for image data. Remember all that data code we built up in the previous section? Let’s get it back, this time by just importing it from the common module that comes with this tutorial:

from common import data, viz
import nussl
# Prepare MUSDB

The next bit of code initializes a Scaper object with all the bells and whistles that were introduced in the last section, then wraps it in a nussl OnTheFly dataset. First, we should set our STFTParams to what we’ll be using throughout this notebook:

stft_params = nussl.STFTParams(window_length=512, hop_length=128, window_type='sqrt_hann')
fg_path = "~/.nussl/tutorial/train"
train_data = data.on_the_fly(stft_params, transform=None, fg_path=fg_path, num_mixtures=1000, coherent_prob=1.0)

Let’s take a look at a single item from the dataset:

item = train_data[0]