Build Your Own HPSS¶
In this section, we will demonstrate the basics of extending nussl
.
Later in the tutorial, we will see how the nussl
API can help us
build advanced neural network architectures. For now, we’ll learn
about the API and a new source separation algorithm:
%%capture
!pip install git+https://github.com/source-separation/tutorial
# Do our imports
import warnings
warnings.simplefilter('ignore')
import nussl
import matplotlib.pyplot as plt
import numpy as np
import librosa
from common import viz
Harmonic-Percussive Source Separation (HPSS)¶
If you spend enough time visualizing musical signals on a spectrogram, you start to notice that harmonic sounds look similar horizontal stripes on a spectrogram and percussive sounds look similar to vertical stripes. Let’s have a look at the audio example from the previous notebook again:
audio_path = nussl.efz_utils.download_audio_file('historyrepeating_7olLrex.wav', verbose=False)
history = nussl.AudioSignal(audio_path)
history.embed_audio()
plt.figure(figsize=(10, 3))
nussl.utils.visualize_spectrogram(history)
plt.title(str(history))
plt.tight_layout()
plt.show()
The insight of Harmonic-Percussive Source Separation (HPSS) is that we can apply median filter accross frequency bins (horizontal, or harmonic) and across time bins (vertical, or percussive) to separate sources.
Let’s build our own HPSS algorithm using nussl
.
nussl
’s Separation API¶
nussl
has a class-based API for adding new algorithms. All separation algorithms
in nussl
inherit from SeparationBase
. SeparationBase
has a child class called
MaskSeparationBase
, which takes care of applying a mask if we create one. Let’s
make our HPSS class inherit from MaskSeparationBase
.
Then there are only two methods we need to overwrite:
__init__(self, audio_signal)
which accepts anAudioSignal
and does setup.run(self)
which runs our HPSS separation onaudio_signal
that we got in the init. It returns a list of masks, each one as anussl.MaskBase()
object. We can turn anumpy
array intoMaskBase()
by callingmask = self.mask_type(mask)
andnussl
will automatically set the mask type correctly.
The only other thing we’ll need is the size of the median filter’s kernel. We can set that as a parameter that the user can set in the init.
class MyHPSS(nussl.separation.base.MaskSeparationBase):
def __init__(self, audio_signal, kernel_size=31, mask_type='soft',
mask_threshold=0.5):
"""Setup code goes here."""
# The super class will save all of these attributes for us.
super().__init__(
input_audio_signal=audio_signal,
mask_type=mask_type,
mask_threshold=mask_threshold
)
# Save the kernel size.
self.kernel_size = kernel_size
def run(self):
"""Code for running HPSS. Returns masks."""
# Keep a list of each mask type.
harmonic_masks = []
percussive_masks = []
# Our signal might have more than one channel:
# Apply HPSS to each channel individually.
for ch in range(self.audio_signal.num_channels):
# apply mask
harmonic_mask, percussive_mask = librosa.decompose.hpss(
self.stft[:, :, ch], kernel_size=self.kernel_size, mask=True)
harmonic_masks.append(harmonic_mask)
percussive_masks.append(percussive_mask)
# Order the masks correctly.
harmonic_masks = np.stack(harmonic_masks, axis=-1)
percussive_masks = np.stack(percussive_masks, axis=-1)
_masks = np.stack([harmonic_masks, percussive_masks], axis=-1)
# Convert the masks to `nussl.MaskBase` types.
self.result_masks = []
for i in range(_masks.shape[-1]):
mask_data = _masks[..., i]
if self.mask_type == self.MASKS['binary']:
mask_data = _masks[..., i] == np.max(_masks, axis=-1)
mask = self.mask_type(mask_data)
self.result_masks.append(mask)
# Return the masks>
return self.result_masks
Using Our HPSS Algorithm¶
Awesome! Now let’s run our HPSS on the audio example above:
my_hpss = MyHPSS(history)
hpss_estimates = my_hpss()
viz.show_sources(hpss_estimates)
And, as always, we can make an interactive version of this. Try recording yourself whistling and clapping at the same time and see how the results sound!
%%capture
# Comment out the line above to run this cell
# interactively in Colab or Jupyter Notebook
my_hpss.interact(share=True, source='microphone')
If you want to upload a song, you can also remove source='microphone'
in the interact()
call:
%%capture
# Comment out the line above to run this cell
# interactively in Colab or Jupyter Notebook
my_hpss.interact(share=True)
That’s it! The nussl
API is quite simple; what goes on inside of it is
the hard part!
Towards the end of this tutorial you will learn how to use the nussl
API
to train and run advanced neural networks for source separation.