Inference#

In this notebook we will demonstrate how to use two pretrained models to generate multitrack mixes of drum recordings. We provide models trained on the ENST-drums dataset, which features a few hundred drums multitracks and mixes of these multitracks made by professional audio engineers. We train two different multitrack mixing model architectures: the Differentiable Mixing Console (DMC), and the MixWaveUNet. First we will download the model checkpoints and some test audio, then load up the models and the audio tracks and generate a mix that we can listen to.

Note: This notebook assumes that you have already installed the automix package. If you have not done so, you can run the following:

!pip install git+https://github.com/csteinmetz1/automix-toolkit
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/csteinmetz1/automix-toolkit
  Cloning https://github.com/csteinmetz1/automix-toolkit to /tmp/pip-req-build-sko6r3wa
  Running command git clone -q https://github.com/csteinmetz1/automix-toolkit /tmp/pip-req-build-sko6r3wa
Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (from automix-toolkit==0.0.1) (1.12.1+cu113)
Requirement already satisfied: torchvision in /usr/local/lib/python3.8/dist-packages (from automix-toolkit==0.0.1) (0.13.1+cu113)
Requirement already satisfied: torchaudio in /usr/local/lib/python3.8/dist-packages (from automix-toolkit==0.0.1) (0.12.1+cu113)
Collecting pytorch_lightning
  Downloading pytorch_lightning-1.8.3.post1-py3-none-any.whl (798 kB)
     |████████████████████████████████| 798 kB 5.4 MB/s 
?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from automix-toolkit==0.0.1) (4.64.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from automix-toolkit==0.0.1) (1.21.6)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (from automix-toolkit==0.0.1) (3.2.2)
Collecting pedalboard
  Downloading pedalboard-0.6.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     |████████████████████████████████| 3.2 MB 41.2 MB/s 
?25hRequirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from automix-toolkit==0.0.1) (1.7.3)
Collecting auraloss
  Downloading auraloss-0.2.2-py3-none-any.whl (15 kB)
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Collecting pyloudnorm
  Downloading pyloudnorm-0.1.0-py3-none-any.whl (9.3 kB)
Collecting sklearn
  Downloading sklearn-0.0.post1.tar.gz (3.6 kB)
Requirement already satisfied: librosa in /usr/local/lib/python3.8/dist-packages (from auraloss->automix-toolkit==0.0.1) (0.8.1)
Requirement already satisfied: numba>=0.43.0 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (0.56.4)
Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (4.4.2)
Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (1.2.0)
Requirement already satisfied: soundfile>=0.10.2 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (0.11.0)
Requirement already satisfied: scikit-learn!=0.19.0,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (1.0.2)
Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (3.0.0)
Requirement already satisfied: resampy>=0.2.2 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (0.4.2)
Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (1.6.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from librosa->auraloss->automix-toolkit==0.0.1) (21.3)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.8/dist-packages (from numba>=0.43.0->librosa->auraloss->automix-toolkit==0.0.1) (4.13.0)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.8/dist-packages (from numba>=0.43.0->librosa->auraloss->automix-toolkit==0.0.1) (0.39.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from numba>=0.43.0->librosa->auraloss->automix-toolkit==0.0.1) (57.4.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>=20.0->librosa->auraloss->automix-toolkit==0.0.1) (3.0.9)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.8/dist-packages (from pooch>=1.0->librosa->auraloss->automix-toolkit==0.0.1) (2.23.0)
Requirement already satisfied: appdirs>=1.3.0 in /usr/local/lib/python3.8/dist-packages (from pooch>=1.0->librosa->auraloss->automix-toolkit==0.0.1) (1.4.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->auraloss->automix-toolkit==0.0.1) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->auraloss->automix-toolkit==0.0.1) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->auraloss->automix-toolkit==0.0.1) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->auraloss->automix-toolkit==0.0.1) (2022.9.24)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn!=0.19.0,>=0.14.0->librosa->auraloss->automix-toolkit==0.0.1) (3.1.0)
Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.8/dist-packages (from soundfile>=0.10.2->librosa->auraloss->automix-toolkit==0.0.1) (1.15.1)
Requirement already satisfied: pycparser in /usr/local/lib/python3.8/dist-packages (from cffi>=1.0->soundfile>=0.10.2->librosa->auraloss->automix-toolkit==0.0.1) (2.21)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->numba>=0.43.0->librosa->auraloss->automix-toolkit==0.0.1) (3.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->automix-toolkit==0.0.1) (1.4.4)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib->automix-toolkit==0.0.1) (0.11.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->automix-toolkit==0.0.1) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.1->matplotlib->automix-toolkit==0.0.1) (1.15.0)
Requirement already satisfied: future>=0.16.0 in /usr/local/lib/python3.8/dist-packages (from pyloudnorm->automix-toolkit==0.0.1) (0.16.0)
Requirement already satisfied: PyYAML>=5.4 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning->automix-toolkit==0.0.1) (6.0)
Collecting tensorboardX>=2.2
  Downloading tensorboardX-2.5.1-py2.py3-none-any.whl (125 kB)
     |████████████████████████████████| 125 kB 4.5 MB/s 
?25hCollecting lightning-utilities==0.3.*
  Downloading lightning_utilities-0.3.0-py3-none-any.whl (15 kB)
Requirement already satisfied: fsspec[http]>2021.06.0 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning->automix-toolkit==0.0.1) (2022.11.0)
Collecting torchmetrics>=0.7.0
  Downloading torchmetrics-0.11.0-py3-none-any.whl (512 kB)
     |████████████████████████████████| 512 kB 23.3 MB/s 
?25hRequirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning->automix-toolkit==0.0.1) (4.1.1)
Collecting fire
  Downloading fire-0.4.0.tar.gz (87 kB)
     |████████████████████████████████| 87 kB 2.7 MB/s 
?25hRequirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.8/dist-packages (from fsspec[http]>2021.06.0->pytorch_lightning->automix-toolkit==0.0.1) (3.8.3)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch_lightning->automix-toolkit==0.0.1) (4.0.2)
Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch_lightning->automix-toolkit==0.0.1) (2.1.1)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch_lightning->automix-toolkit==0.0.1) (1.3.3)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch_lightning->automix-toolkit==0.0.1) (1.3.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch_lightning->automix-toolkit==0.0.1) (6.0.2)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch_lightning->automix-toolkit==0.0.1) (22.1.0)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>2021.06.0->pytorch_lightning->automix-toolkit==0.0.1) (1.8.1)
Requirement already satisfied: protobuf<=3.20.1,>=3.8.0 in /usr/local/lib/python3.8/dist-packages (from tensorboardX>=2.2->pytorch_lightning->automix-toolkit==0.0.1) (3.19.6)
Requirement already satisfied: termcolor in /usr/local/lib/python3.8/dist-packages (from fire->lightning-utilities==0.3.*->pytorch_lightning->automix-toolkit==0.0.1) (2.1.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.8/dist-packages (from torchvision->automix-toolkit==0.0.1) (7.1.2)
Building wheels for collected packages: automix-toolkit, fire, sklearn, wget
  Building wheel for automix-toolkit (setup.py) ... ?25l?25hdone
  Created wheel for automix-toolkit: filename=automix_toolkit-0.0.1-py3-none-any.whl size=35727 sha256=b5a3c151058126481ce5d442553be9a1308c6f91845b472d2996cc4f99078c3a
  Stored in directory: /tmp/pip-ephem-wheel-cache-enn555sr/wheels/66/2a/85/4c0a92c4a2d0108f71a9a138ac530a0346a7d57496aaab973a
  Building wheel for fire (setup.py) ... ?25l?25hdone
  Created wheel for fire: filename=fire-0.4.0-py2.py3-none-any.whl size=115943 sha256=8b6555b8a47533e9957127618febcbc3d9bd8488334b5ec162fb18b95fd88c01
  Stored in directory: /root/.cache/pip/wheels/1f/10/06/2a990ee4d73a8479fe2922445e8a876d38cfbfed052284c6a1
  Building wheel for sklearn (setup.py) ... ?25l?25hdone
  Created wheel for sklearn: filename=sklearn-0.0.post1-py3-none-any.whl size=2344 sha256=37ef5ceee089b66d9e8c4130167ebf42a02d9be4bd8c03140358f9f7903e12e7
  Stored in directory: /root/.cache/pip/wheels/14/25/f7/1cc0956978ae479e75140219088deb7a36f60459df242b1a72
  Building wheel for wget (setup.py) ... ?25l?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9674 sha256=a08e290010532e777bcce424e06841fd82133eb0dde9358038b34afd398f9f47
  Stored in directory: /root/.cache/pip/wheels/bd/a8/c3/3cf2c14a1837a4e04bd98631724e81f33f462d86a1d895fae0
Successfully built automix-toolkit fire sklearn wget
Installing collected packages: fire, torchmetrics, tensorboardX, lightning-utilities, wget, sklearn, pytorch-lightning, pyloudnorm, pedalboard, auraloss, automix-toolkit
Successfully installed auraloss-0.2.2 automix-toolkit-0.0.1 fire-0.4.0 lightning-utilities-0.3.0 pedalboard-0.6.6 pyloudnorm-0.1.0 pytorch-lightning-1.8.3.post1 sklearn-0.0.post1 tensorboardX-2.5.1 torchmetrics-0.11.0 wget-3.2
import os
import glob
import torch
import torchaudio
import numpy as np

import IPython
import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display

%matplotlib inline
%load_ext autoreload
%autoreload 2

from automix.system import System

Download the pretrained models and multitracks#

First we will download two different pretrained models. Then we will also download a .zip file containing a drum multitrack and the demo mulitrack that were unseen during training.

# download the pretrained models for DMC and MixWaveUNet trained on ENST-drums dataset
os.makedirs("checkpoints", exist_ok=True)
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-dmc.ckpt
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-mixwaveunet.ckpt
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/medleydb-16-dmc.ckpt
!mv enst-drums-dmc.ckpt checkpoints/enst-drums-dmc.ckpt
!mv enst-drums-mixwaveunet.ckpt checkpoints/enst-drums-mixwaveunet.ckpt
!mv medleydb-16-dmc.ckpt checkpoints/medleydb-16-dmc.ckpt

# then download and extract a drum multitrack from the test set
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/drums-test-rock.zip
!unzip -o drums-test-rock.zip

!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/flare-dry-stems.zip
!unzip -o flare-dry-stems.zip -d flare-dry-stems
--2022-12-01 17:37:50--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-dmc.ckpt
Resolving huggingface.co (huggingface.co)... 54.147.99.175, 34.227.196.80, 2600:1f18:147f:e850:fad3:e054:c752:ff16, ...
Connecting to huggingface.co (huggingface.co)|54.147.99.175|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/02988c14c2aeee899dc44488f61c58ca6902e3d815931e6fdd5edda969f70f18?response-content-disposition=attachment%3B%20filename%3D%22enst-drums-dmc.ckpt%22&Expires=1670159985&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvMDI5ODhjMTRjMmFlZWU4OTlkYzQ0NDg4ZjYxYzU4Y2E2OTAyZTNkODE1OTMxZTZmZGQ1ZWRkYTk2OWY3MGYxOD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMmVuc3QtZHJ1bXMtZG1jLmNrcHQlMjIiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2NzAxNTk5ODV9fX1dfQ__&Signature=PspHsGWMLjuBtXMgKhID4ybZULfzgJqr0O1SD3glnNaDuS~Ve5Grnefnj7hnZXCl4zyxPTDTEP6-UfkTOdQnXYrNJ4q0PYA9rBDlTCPstMmZwX2Hva~urgTNNCVL6rs3fRt6KNTEOHZFdHdR9osrgu90c9s~sFvZIIFcbi0H~9DwuFa4xXHDhkOjw1XfoWLPZ9J0r-tkISsIfr9vysWOfQcgC8Gf5nMm-RdENCFeqBftvFT5Ge2eyTi9TBgPAzAU~vgvzhl1jTWkDCc-Onxwa~tYCILj6X0NL5niLLEOeac4AKIFn5Vuo8CUsFQ6ZpKXq8L2h1wKyTM1Jasjbc1n3A__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2022-12-01 17:37:50--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/02988c14c2aeee899dc44488f61c58ca6902e3d815931e6fdd5edda969f70f18?response-content-disposition=attachment%3B%20filename%3D%22enst-drums-dmc.ckpt%22&Expires=1670159985&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvMDI5ODhjMTRjMmFlZWU4OTlkYzQ0NDg4ZjYxYzU4Y2E2OTAyZTNkODE1OTMxZTZmZGQ1ZWRkYTk2OWY3MGYxOD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMmVuc3QtZHJ1bXMtZG1jLmNrcHQlMjIiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2NzAxNTk5ODV9fX1dfQ__&Signature=PspHsGWMLjuBtXMgKhID4ybZULfzgJqr0O1SD3glnNaDuS~Ve5Grnefnj7hnZXCl4zyxPTDTEP6-UfkTOdQnXYrNJ4q0PYA9rBDlTCPstMmZwX2Hva~urgTNNCVL6rs3fRt6KNTEOHZFdHdR9osrgu90c9s~sFvZIIFcbi0H~9DwuFa4xXHDhkOjw1XfoWLPZ9J0r-tkISsIfr9vysWOfQcgC8Gf5nMm-RdENCFeqBftvFT5Ge2eyTi9TBgPAzAU~vgvzhl1jTWkDCc-Onxwa~tYCILj6X0NL5niLLEOeac4AKIFn5Vuo8CUsFQ6ZpKXq8L2h1wKyTM1Jasjbc1n3A__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.83.97, 108.156.83.35, 108.156.83.76, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.83.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149613223 (143M) [binary/octet-stream]
Saving to: ‘enst-drums-dmc.ckpt’

enst-drums-dmc.ckpt 100%[===================>] 142.68M  65.7MB/s    in 2.2s    

2022-12-01 17:37:53 (65.7 MB/s) - ‘enst-drums-dmc.ckpt’ saved [149613223/149613223]

--2022-12-01 17:37:53--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-mixwaveunet.ckpt
Resolving huggingface.co (huggingface.co)... 54.147.99.175, 34.227.196.80, 2600:1f18:147f:e850:fad3:e054:c752:ff16, ...
Connecting to huggingface.co (huggingface.co)|54.147.99.175|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/db99c19bfaca2e83e17d669bb850926a0be567b690f6f63fdb0a7f44202d94a3?response-content-disposition=attachment%3B%20filename%3D%22enst-drums-mixwaveunet.ckpt%22&Expires=1670175474&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvZGI5OWMxOWJmYWNhMmU4M2UxN2Q2NjliYjg1MDkyNmEwYmU1NjdiNjkwZjZmNjNmZGIwYTdmNDQyMDJkOTRhMz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMmVuc3QtZHJ1bXMtbWl4d2F2ZXVuZXQuY2twdCUyMiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY3MDE3NTQ3NH19fV19&Signature=An9H4aM9M7P19nY4RaLROlEfSL6eOf2SdwqmlLOQfFeGyFbe1zfwNBbqZhpFdEozQxqwO~YnOmeSSUu-7s0JC6Or~dGciM4du14fH~YegTFwayyJUcwT0pQUP3Ua80RsjGyFsfeO8aslIYGdnue9toUZz6At83pCkkGQjSVtbNhJLto3sEJ5tmkLPSojSss2bCpT69TAi-ztQBLiOXx1wePS3~AKvJUDGROl9bupu9M8XBCqrmx~xxitnZaltbkBil4CWexjrdwx1usgFGEglU3EzIMkqcJE5N~wSmmL-VKHoiWDMvNnbdgP4y66NHH4FEtobP3YVQbjFAMXygaLEA__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2022-12-01 17:37:53--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/db99c19bfaca2e83e17d669bb850926a0be567b690f6f63fdb0a7f44202d94a3?response-content-disposition=attachment%3B%20filename%3D%22enst-drums-mixwaveunet.ckpt%22&Expires=1670175474&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvZGI5OWMxOWJmYWNhMmU4M2UxN2Q2NjliYjg1MDkyNmEwYmU1NjdiNjkwZjZmNjNmZGIwYTdmNDQyMDJkOTRhMz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMmVuc3QtZHJ1bXMtbWl4d2F2ZXVuZXQuY2twdCUyMiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY3MDE3NTQ3NH19fV19&Signature=An9H4aM9M7P19nY4RaLROlEfSL6eOf2SdwqmlLOQfFeGyFbe1zfwNBbqZhpFdEozQxqwO~YnOmeSSUu-7s0JC6Or~dGciM4du14fH~YegTFwayyJUcwT0pQUP3Ua80RsjGyFsfeO8aslIYGdnue9toUZz6At83pCkkGQjSVtbNhJLto3sEJ5tmkLPSojSss2bCpT69TAi-ztQBLiOXx1wePS3~AKvJUDGROl9bupu9M8XBCqrmx~xxitnZaltbkBil4CWexjrdwx1usgFGEglU3EzIMkqcJE5N~wSmmL-VKHoiWDMvNnbdgP4y66NHH4FEtobP3YVQbjFAMXygaLEA__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.83.97, 108.156.83.35, 108.156.83.76, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.83.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 214227663 (204M) [binary/octet-stream]
Saving to: ‘enst-drums-mixwaveunet.ckpt’

enst-drums-mixwaveu 100%[===================>] 204.30M  64.6MB/s    in 3.4s    

2022-12-01 17:37:57 (59.6 MB/s) - ‘enst-drums-mixwaveunet.ckpt’ saved [214227663/214227663]

--2022-12-01 17:37:57--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/medleydb-16-dmc.ckpt
Resolving huggingface.co (huggingface.co)... 54.147.99.175, 34.227.196.80, 2600:1f18:147f:e850:fad3:e054:c752:ff16, ...
Connecting to huggingface.co (huggingface.co)|54.147.99.175|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/91e2e463c59ca0998177274d7bbbf3debbe187e57f5fabf34ea80ee86e72f6a0?response-content-disposition=attachment%3B%20filename%3D%22medleydb-16-dmc.ckpt%22&Expires=1670164515&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvOTFlMmU0NjNjNTljYTA5OTgxNzcyNzRkN2JiYmYzZGViYmUxODdlNTdmNWZhYmYzNGVhODBlZTg2ZTcyZjZhMD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMm1lZGxleWRiLTE2LWRtYy5ja3B0JTIyIiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNjcwMTY0NTE1fX19XX0_&Signature=RaCbyq7IyebyWwR5sYBmq0WTRjh0eX3Oqg2Jyi4adjOZ9XGKpZGQ5SA~RoO8e69pb48AL57uGGBah71AVwZfSe3oLoxh9SCWLTsJ0LWL44Z0C8KHqWRu0G1-~fmcd7tqSpoxDncXNwWU3zoG10NNEcIvGiMNGCsrgMwjTRK2kGWkf84p8i0KFSTf-p80uvwB4bljYKNlwUKv~UtJkOjBMBKpbpBDeAvzwKJqbM81Q1hWjkK-ic75jphERGZLzPLDt1PXZkrYq6MgHZJIM9IgyLDuAX7CGAKih~22NcJyHb208QQqdZhr6a4jbx6-RbRsmZznHJT~zDlccZSycsF47w__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2022-12-01 17:37:57--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/91e2e463c59ca0998177274d7bbbf3debbe187e57f5fabf34ea80ee86e72f6a0?response-content-disposition=attachment%3B%20filename%3D%22medleydb-16-dmc.ckpt%22&Expires=1670164515&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvOTFlMmU0NjNjNTljYTA5OTgxNzcyNzRkN2JiYmYzZGViYmUxODdlNTdmNWZhYmYzNGVhODBlZTg2ZTcyZjZhMD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMm1lZGxleWRiLTE2LWRtYy5ja3B0JTIyIiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNjcwMTY0NTE1fX19XX0_&Signature=RaCbyq7IyebyWwR5sYBmq0WTRjh0eX3Oqg2Jyi4adjOZ9XGKpZGQ5SA~RoO8e69pb48AL57uGGBah71AVwZfSe3oLoxh9SCWLTsJ0LWL44Z0C8KHqWRu0G1-~fmcd7tqSpoxDncXNwWU3zoG10NNEcIvGiMNGCsrgMwjTRK2kGWkf84p8i0KFSTf-p80uvwB4bljYKNlwUKv~UtJkOjBMBKpbpBDeAvzwKJqbM81Q1hWjkK-ic75jphERGZLzPLDt1PXZkrYq6MgHZJIM9IgyLDuAX7CGAKih~22NcJyHb208QQqdZhr6a4jbx6-RbRsmZznHJT~zDlccZSycsF47w__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.83.97, 108.156.83.35, 108.156.83.76, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.83.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149614695 (143M) [binary/octet-stream]
Saving to: ‘medleydb-16-dmc.ckpt’

medleydb-16-dmc.ckp 100%[===================>] 142.68M  57.7MB/s    in 2.5s    

2022-12-01 17:38:00 (57.7 MB/s) - ‘medleydb-16-dmc.ckpt’ saved [149614695/149614695]

--2022-12-01 17:38:01--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/drums-test-rock.zip
Resolving huggingface.co (huggingface.co)... 54.147.99.175, 34.227.196.80, 2600:1f18:147f:e850:fad3:e054:c752:ff16, ...
Connecting to huggingface.co (huggingface.co)|54.147.99.175|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/78590471160237edbabf64fc347697793a647ed287bcff367bfa577753e93b70?response-content-disposition=attachment%3B%20filename%3D%22drums-test-rock.zip%22&Expires=1670175481&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvNzg1OTA0NzExNjAyMzdlZGJhYmY2NGZjMzQ3Njk3NzkzYTY0N2VkMjg3YmNmZjM2N2JmYTU3Nzc1M2U5M2I3MD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMmRydW1zLXRlc3Qtcm9jay56aXAlMjIiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2NzAxNzU0ODF9fX1dfQ__&Signature=WLz0gj9xmgeAUD~cpmIYKzJckRDz7V8HJQPluif1IIaXcWLO0z3apnh8JhrhLMp39n0AwEObluV~mJp92MJhdvzI-PVEj0cdvfR7Ap1BaPUrjFC64xW-vNffwznftvWdv7cRyEgGkE1cjnjdiaEV2O3-xj6VTqtF1hINRF~Kn9e1kLTx~Gun0nY54eMU8~Yw018J6rbUlmA5eG~WRl0DIujRZN9bLQm0UTCxy-R3wLHgr9lZNrKAFGHwEVSvcjIfYT2gPVe5MvOB454tk0nwWC5tWzX1b2~mV3YAK8QlqstitOm0cJD4WC8Ew-mSdVToxvQlqOVO-9Nr~eZQBr8Dyw__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2022-12-01 17:38:01--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/78590471160237edbabf64fc347697793a647ed287bcff367bfa577753e93b70?response-content-disposition=attachment%3B%20filename%3D%22drums-test-rock.zip%22&Expires=1670175481&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvNzg1OTA0NzExNjAyMzdlZGJhYmY2NGZjMzQ3Njk3NzkzYTY0N2VkMjg3YmNmZjM2N2JmYTU3Nzc1M2U5M2I3MD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMmRydW1zLXRlc3Qtcm9jay56aXAlMjIiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2NzAxNzU0ODF9fX1dfQ__&Signature=WLz0gj9xmgeAUD~cpmIYKzJckRDz7V8HJQPluif1IIaXcWLO0z3apnh8JhrhLMp39n0AwEObluV~mJp92MJhdvzI-PVEj0cdvfR7Ap1BaPUrjFC64xW-vNffwznftvWdv7cRyEgGkE1cjnjdiaEV2O3-xj6VTqtF1hINRF~Kn9e1kLTx~Gun0nY54eMU8~Yw018J6rbUlmA5eG~WRl0DIujRZN9bLQm0UTCxy-R3wLHgr9lZNrKAFGHwEVSvcjIfYT2gPVe5MvOB454tk0nwWC5tWzX1b2~mV3YAK8QlqstitOm0cJD4WC8Ew-mSdVToxvQlqOVO-9Nr~eZQBr8Dyw__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.83.97, 108.156.83.35, 108.156.83.76, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.83.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20044145 (19M) [application/zip]
Saving to: ‘drums-test-rock.zip’

drums-test-rock.zip 100%[===================>]  19.12M  54.2MB/s    in 0.4s    

2022-12-01 17:38:02 (54.2 MB/s) - ‘drums-test-rock.zip’ saved [20044145/20044145]

Archive:  drums-test-rock.zip
   creating: drums-test-rock/
  inflating: __MACOSX/._drums-test-rock  
  inflating: drums-test-rock/.DS_Store  
  inflating: __MACOSX/drums-test-rock/._.DS_Store  
   creating: drums-test-rock/tracks/
   creating: drums-test-rock/mix/
  inflating: drums-test-rock/tracks/04_overhead_L_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/01_kick_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/03_hi-hat_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/02_snare_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/07_tom_2_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/06_tom_1_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/05_overhead_R_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/08_tom_3_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_DMC.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_MixWaveUNet.wav  
--2022-12-01 17:38:02--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/flare-dry-stems.zip
Resolving huggingface.co (huggingface.co)... 54.147.99.175, 34.227.196.80, 2600:1f18:147f:e850:fad3:e054:c752:ff16, ...
Connecting to huggingface.co (huggingface.co)|54.147.99.175|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/7ff7a103f3d1ed883038465361fb88dc9812f67c2f7a527e78b4ba95cd7053a9?response-content-disposition=attachment%3B%20filename%3D%22flare-dry-stems.zip%22&Expires=1670160005&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvN2ZmN2ExMDNmM2QxZWQ4ODMwMzg0NjUzNjFmYjg4ZGM5ODEyZjY3YzJmN2E1MjdlNzhiNGJhOTVjZDcwNTNhOT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMmZsYXJlLWRyeS1zdGVtcy56aXAlMjIiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2NzAxNjAwMDV9fX1dfQ__&Signature=J6HhadWoFGixAIfWNptNLng2bf4r6Ewcrkq-A-9MDVS6ZqrVRJcIcNW7PrMHDu6YbuwsNelOzwRPesuIKQnJGd00EbAARUK5sXyIURpKtiLyAez9x~0CtCYzwutw1c7NLyXiQdbb89lhODOmqWL4E1eLSozLq~kpSa5CpsX82ld~D5cK~G-PpF4pQoIKchKNbcu0Yuyz~EijaQYSWq6Tg~hU8lXwYKwg8ZcEjxkRfN3jykB1nEQkElNC4cvCg2lh4vkWSRPCoobQvCOF-CDN6mjna8vDtafa6seVZCx0PfwTQbT1ayW3OqL5O3P6tlJHQyvJxywYx3zX8-EJ0nsDsg__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2022-12-01 17:38:03--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/7ff7a103f3d1ed883038465361fb88dc9812f67c2f7a527e78b4ba95cd7053a9?response-content-disposition=attachment%3B%20filename%3D%22flare-dry-stems.zip%22&Expires=1670160005&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VjL2VlL2VjZWUzOGRmMDQ3ZTNmMmRiMWJkOGMzMWE3NDJmM2EwOGY1NTc0NzBjZDY3Y2I0ODc0MDJhOWMzZWQ5MWI1ZWEvN2ZmN2ExMDNmM2QxZWQ4ODMwMzg0NjUzNjFmYjg4ZGM5ODEyZjY3YzJmN2E1MjdlNzhiNGJhOTVjZDcwNTNhOT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPWF0dGFjaG1lbnQlM0IlMjBmaWxlbmFtZSUzRCUyMmZsYXJlLWRyeS1zdGVtcy56aXAlMjIiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2NzAxNjAwMDV9fX1dfQ__&Signature=J6HhadWoFGixAIfWNptNLng2bf4r6Ewcrkq-A-9MDVS6ZqrVRJcIcNW7PrMHDu6YbuwsNelOzwRPesuIKQnJGd00EbAARUK5sXyIURpKtiLyAez9x~0CtCYzwutw1c7NLyXiQdbb89lhODOmqWL4E1eLSozLq~kpSa5CpsX82ld~D5cK~G-PpF4pQoIKchKNbcu0Yuyz~EijaQYSWq6Tg~hU8lXwYKwg8ZcEjxkRfN3jykB1nEQkElNC4cvCg2lh4vkWSRPCoobQvCOF-CDN6mjna8vDtafa6seVZCx0PfwTQbT1ayW3OqL5O3P6tlJHQyvJxywYx3zX8-EJ0nsDsg__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.83.97, 108.156.83.35, 108.156.83.76, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.83.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 271700657 (259M) [application/zip]
Saving to: ‘flare-dry-stems.zip’

flare-dry-stems.zip 100%[===================>] 259.11M  73.9MB/s    in 3.6s    

2022-12-01 17:38:07 (71.5 MB/s) - ‘flare-dry-stems.zip’ saved [271700657/271700657]

Archive:  flare-dry-stems.zip
Written using ZipTricks 5.6.0
 extracting: flare-dry-stems/Flare Bass Stem Dry.wav  
 extracting: flare-dry-stems/Flare Drum Stem Dry.wav  
 extracting: flare-dry-stems/Flare Instrument Stem Dry.wav  
 extracting: flare-dry-stems/Flare Vocal Stem Dry.wav  
!ls
checkpoints	 drums-test-rock.zip  flare-dry-stems.zip  sample_data
drums-test-rock  flare-dry-stems      __MACOSX

Set configuration#

We have the option to select one of two different checkpoints.

If we select enst-drums-dmc.ckpt we can use the pretrained Differentiable mixing console model which will directly predict gain and panning parameters for each track. On the other hand we can also select enst-drums-mixwaveunet.ckpt which will use a multi-input WaveUNet to create a mix of the tracks. To make computation faster we can restrict the maximum number of samples the process with max_samples. Using the default max_samples = 262144 will mix about the first 6 seconds of the track. You can try increasing this value to see how the results change.

Note: In the case of MixWaveUNet, a power of 2 value for max_samples is required.

track_dir = "./drums-test-rock/tracks"
track_ext = "wav"

dmc_ckpt_path = "checkpoints/enst-drums-dmc.ckpt"
mwun_ckpt_path = "checkpoints/enst-drums-mixwaveunet.ckpt"

max_samples = 262144

Load pretrained model#

# load pretrained model
dmc_system = System.load_from_checkpoint(dmc_ckpt_path, pretrained_encoder=False, map_location="cpu").eval()
mwun_system = System.load_from_checkpoint(mwun_ckpt_path, map_location="cpu").eval()
/usr/local/lib/python3.8/dist-packages/torchaudio/functional/functional.py:539: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (257) may be set too low.
  warnings.warn(

Load multitrack#

Now we will read the tracks from disk and create a tensor with all the tracks. In this case, we first peak normalize each track to -12 dB which is what the models expect. In the case of MixWaveUNet, we will add an extra track of silence if less than 8 are provided. However, the DMC model can accept any number of tracks, wether more or less than it was trained with.

We can also create a simple mono mixture of these tracks to hear what the multitrack sounds like before we do any mixing.

# load the input tracks
track_filepaths = glob.glob(os.path.join(track_dir, f"*.{track_ext}"))
track_filepaths = sorted(track_filepaths)
tracks = []
for idx, track_filepath in enumerate(track_filepaths):
    x, sr = torchaudio.load(track_filepath)
    x = x[:, : max_samples]
    x /= x.abs().max().clamp(1e-8) # peak normalize
    x *= 10 ** (-12/20.0) # set peak to -12 dB
    tracks.append(x)

    plt.figure(figsize=(10, 2))
    librosa.display.waveshow(x.view(-1).numpy(), sr=sr, zorder=3)
    plt.title(f"{idx+1} {os.path.basename(track_filepath)}")
    plt.ylim([-1,1])
    plt.grid(c="lightgray")
    plt.show()
    IPython.display.display(ipd.Audio(x.view(-1).numpy(), rate=sr, normalize=True))    

# add dummy tracks of silence if needed
if len(tracks) < 8:
    tracks.append(torch.zeros(x.shape))

# stack tracks into a tensor
tracks = torch.stack(tracks, dim=0)
tracks = tracks.permute(1, 0, 2)
# tracks have shape (1, num_tracks, seq_len)

# listen to the input (mono) before mixing
input_mix = tracks.sum(dim=1, keepdim=True)
print(input_mix.shape)
plt.figure(figsize=(10, 2))
plt.title("Mono Mix")
librosa.display.waveshow(input_mix.view(-1).numpy(), sr=sr, zorder=3, color="tab:orange")
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(input_mix.view(-1).numpy(), rate=sr, normalize=False))
../_images/01_inference_11_0.png
../_images/01_inference_11_2.png
../_images/01_inference_11_4.png
../_images/01_inference_11_6.png
../_images/01_inference_11_8.png
../_images/01_inference_11_10.png
../_images/01_inference_11_12.png
../_images/01_inference_11_14.png
torch.Size([1, 1, 262144])
../_images/01_inference_11_17.png

Generate the DMC mix#

Now we can listen to the predicted mix. If we create a mix with the differentiable mixing console we can also print out the gain (in dB) and pan parameter for each track.

# pass tracks to the model and create a mix
with torch.no_grad(): # no need to compute gradients
    mix, params = dmc_system(tracks[:,:-1,:])
print(mix.shape, params.shape)

# view the mix
mix /= mix.abs().max()
plt.figure(figsize=(10, 2))
plt.title("Differentiable Mixing Console")
librosa.display.waveshow(mix.view(2,-1).numpy(), sr=sr, zorder=3)
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(mix.view(2,-1).numpy(), rate=sr, normalize=True))

for track_fp, param in zip(track_filepaths, params.squeeze()):
    print(os.path.basename(track_fp), param)
torch.Size([1, 2, 262144]) torch.Size([1, 7, 2])
/usr/local/lib/python3.8/dist-packages/librosa/util/utils.py:198: UserWarning: librosa.util.frame called with axis=-1 on a non-contiguous input. This will result in a copy.
  warnings.warn(
../_images/01_inference_13_2.png
01_kick_066_phrase_rock_complex_fast_sticks.wav tensor([12.3843,  0.5003])
02_snare_066_phrase_rock_complex_fast_sticks.wav tensor([13.0229,  0.5067])
03_hi-hat_066_phrase_rock_complex_fast_sticks.wav tensor([5.0208, 0.5011])
04_overhead_L_066_phrase_rock_complex_fast_sticks.wav tensor([6.4820e+00, 1.4221e-03])
05_overhead_R_066_phrase_rock_complex_fast_sticks.wav tensor([7.4902, 0.9986])
06_tom_1_066_phrase_rock_complex_fast_sticks.wav tensor([-4.6055,  0.7456])
07_tom_2_066_phrase_rock_complex_fast_sticks.wav tensor([1.5387, 0.3615])

Generate the Mix-Wave-U-Net Mix#

If we use the MixWaveUNet there are no parameters to show since this model uses a direct transformation method which does not use intermediate mixing parameters.

with torch.no_grad(): # no need to compute gradients
    mwun_mix, params = mwun_system(tracks)
print(mix.shape, params.shape)

# view the mix
mwun_mix /= mwun_mix.abs().max()
plt.figure(figsize=(10, 2))
plt.title("Mix-Wave-U-Net")
librosa.display.waveshow(mwun_mix.view(2,-1).numpy(), sr=sr, zorder=3)
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(mwun_mix.view(2,-1).numpy(), rate=sr, normalize=True))
torch.Size([1, 2, 262144]) torch.Size([1])
/usr/local/lib/python3.8/dist-packages/librosa/util/utils.py:198: UserWarning: librosa.util.frame called with axis=-1 on a non-contiguous input. This will result in a copy.
  warnings.warn(
../_images/01_inference_15_2.png

MedleyDB#

Now we will run DMC that was trained on MedleyDB, which includes many types of instruments. This model was trained with all songs that had 16 or less tracks.

dmc_ckpt_path = "checkpoints/medleydb-16-dmc.ckpt"

# load pretrained model
medley_dmc_system = System.load_from_checkpoint(dmc_ckpt_path, pretrained_encoder=False, map_location="cpu").eval()
/usr/local/lib/python3.8/dist-packages/torchaudio/functional/functional.py:539: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (257) may be set too low.
  warnings.warn(

Load tracks#

We will use the stems from the song that Gary mixed in the first part of the tutorial.

track_dir = "./flare-dry-stems"
track_ext = "wav"

start_sample = int(32 * 44100)
end_sample = start_sample + int(40 * 44100)

# load the input tracks
track_filepaths = glob.glob(os.path.join(track_dir, f"*.{track_ext}"))
track_filepaths = sorted(track_filepaths)
tracks = []
track_names = []
for idx, track_filepath in enumerate(track_filepaths):
    x, sr = torchaudio.load(track_filepath)

    if "Vocal" in track_filepath or "Bass" in track_filepath:
      x_L = x[0:1, start_sample:end_sample]
      #x_L /= x_L.abs().max().clamp(1e-8) # peak normalize
      #x_L *= 10 ** (-12/20.0) # set peak to -12 dB
      tracks.append(x_L)
      track_names.append(os.path.basename(track_filepath))

    else:
      x_L = x[0:1, start_sample:end_sample]
      x_R = x[1:2, start_sample:end_sample]

      #x_L /= x_L.abs().max().clamp(1e-8) # peak normalize
      #x_L *= 10 ** (-12/20.0) # set peak to -12 dB

      #x_R /= x_R.abs().max().clamp(1e-8) # peak normalize
      #x_R *= 10 ** (-12/20.0) # set peak to -12 dB

      tracks.append(x_L)
      tracks.append(x_R)
      track_names.append(os.path.basename(track_filepath) + "-L")
      track_names.append(os.path.basename(track_filepath) + "-R")

    plt.figure(figsize=(10, 2))
    librosa.display.waveshow(x_L.view(-1).numpy(), sr=sr, zorder=3)
    plt.title(f"{idx+1} {os.path.basename(track_filepath)}")
    plt.ylim([-1,1])
    plt.grid(c="lightgray")
    plt.show()
    IPython.display.display(ipd.Audio(x_L.view(-1).numpy(), rate=sr, normalize=True))    

# stack tracks into a tensor
tracks = torch.stack(tracks, dim=0)
tracks = tracks.permute(1, 0, 2)
# tracks have shape (1, num_tracks, seq_len)

# listen to the input (mono) before mixing
input_mix = tracks.sum(dim=1, keepdim=True).clamp(-1, 1)
plt.figure(figsize=(10, 2))
plt.title("Mono Mix")
librosa.display.waveshow(input_mix.view(-1).numpy(), sr=sr, zorder=3, color="tab:orange")
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(input_mix.view(-1).numpy(), rate=sr, normalize=False))
../_images/01_inference_19_0.png
../_images/01_inference_19_2.png
../_images/01_inference_19_4.png
../_images/01_inference_19_6.png
../_images/01_inference_19_8.png

Now we can create a gain and panning mix of these stems.

# pass tracks to the model and create a mix
with torch.no_grad(): # no need to compute gradients
    mix = medley_dmc_system.model.block_based_forward(tracks, 262144, 262144//2)
#print(mix.shape, params.shape)

# view the mix
mix /= mix.abs().max()
plt.figure(figsize=(10, 2))
plt.title("Differentiable Mixing Console")
librosa.display.waveshow(mix.view(2,-1).numpy(), sr=sr, zorder=3)
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(mix.view(2,-1).numpy(), rate=sr, normalize=True))

#for track_fp, param in zip(track_names, params.squeeze()):
#    print(os.path.basename(track_fp), param)
/usr/local/lib/python3.8/dist-packages/librosa/util/utils.py:198: UserWarning: librosa.util.frame called with axis=-1 on a non-contiguous input. This will result in a copy.
  warnings.warn(
../_images/01_inference_21_1.png

Certainly not a perfect mix, but notice that the model has learned to raise the level of the vocal, pan it to the center, and try to pan the other elements to the sides.