Inference#

In this notebook we will demonstrate how to use two pretrained models to generate multitrack mixes of drum recordings. We provide models trained on the ENST-drums dataset, which features a few hundred drums multitracks and mixes of these multitracks made by professional audio engineers. We train two different multitrack mixing model architectures: the Differentiable Mixing Console (DMC), and the MixWaveUNet. First we will download the model checkpoints and some test audio, then load up the models and the audio tracks and generate a mix that we can listen to.

Note: This notebook assumes that you have already installed the automix package. If you have not done so, you can run the following:

!pip install git+https://github.com/csteinmetz1/automix-toolkit
import os
import glob
import torch
import torchaudio
import numpy as np

import IPython
import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display

%matplotlib inline
%load_ext autoreload
%autoreload 2

from automix.system import System

Download the pretrained models and multitracks#

First we will download two different pretrained models. Then we will also download a .zip file containing a drum multitrack and the demo mulitrack that were unseen during training.

# download the pretrained models for DMC and MixWaveUNet trained on ENST-drums dataset
os.makedirs("checkpoints", exist_ok=True)
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-dmc.ckpt
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-mixwaveunet.ckpt
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/medleydb-16-dmc.ckpt
!mv enst-drums-dmc.ckpt checkpoints/enst-drums-dmc.ckpt
!mv enst-drums-mixwaveunet.ckpt checkpoints/enst-drums-mixwaveunet.ckpt
!mv medleydb-16-dmc.ckpt checkpoints/medleydb-16-dmc.ckpt

# then download and extract a drum multitrack from the test set
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/drums-test-rock.zip
!unzip -o drums-test-rock.zip

!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/flare-dry-stems.zip
!unzip -o flare-dry-stems.zip -d flare-dry-stems
--2024-08-29 16:38:14--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-dmc.ckpt
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:1c00:17:b174:6d00:93a1, 2600:9000:2751:9000:17:b174:6d00:93a1, 2600:9000:2751:a600:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:1c00:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/02988c14c2aeee899dc44488f61c58ca6902e3d815931e6fdd5edda969f70f18?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27enst-drums-dmc.ckpt%3B+filename%3D%22enst-drums-dmc.ckpt%22%3B&Expires=1725176272&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI3Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzAyOTg4YzE0YzJhZWVlODk5ZGM0NDQ4OGY2MWM1OGNhNjkwMmUzZDgxNTkzMWU2ZmRkNWVkZGE5NjlmNzBmMTg%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=i5RkqYTsso8h-MxZ4-EXnkLmaGCIMzcZRRrihPcfwQPVDiIJY8AfbTiRVrxNa9gTxRtt5FHKHdISZi1a6OGk%7Eqa%7EqhwNEpvyc7HeY3I-LY4uJx-WnmVQWeuUvU0onxDUoJd4iRzvVOVTefO-ORmXe%7E65q8-zj16l4mLYCFSkyzMvBfJjTAlmJyRCyiUNpM9CnsSoHns9U5lThG2gjDYQAjE-XoP-Q7lyl-FGZnWLw1mUprSgw6QVfxfVxrv2oeZMBQVr-lqHxcxPV3fwF88V1JI2ClPytqx%7E5AxThJAdt9RYC9IaQeSL4hcQ9-nQr7%7EGYtlhfd8UyM68fRwx4zb5Tw__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:14--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/02988c14c2aeee899dc44488f61c58ca6902e3d815931e6fdd5edda969f70f18?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27enst-drums-dmc.ckpt%3B+filename%3D%22enst-drums-dmc.ckpt%22%3B&Expires=1725176272&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI3Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzAyOTg4YzE0YzJhZWVlODk5ZGM0NDQ4OGY2MWM1OGNhNjkwMmUzZDgxNTkzMWU2ZmRkNWVkZGE5NjlmNzBmMTg%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=i5RkqYTsso8h-MxZ4-EXnkLmaGCIMzcZRRrihPcfwQPVDiIJY8AfbTiRVrxNa9gTxRtt5FHKHdISZi1a6OGk%7Eqa%7EqhwNEpvyc7HeY3I-LY4uJx-WnmVQWeuUvU0onxDUoJd4iRzvVOVTefO-ORmXe%7E65q8-zj16l4mLYCFSkyzMvBfJjTAlmJyRCyiUNpM9CnsSoHns9U5lThG2gjDYQAjE-XoP-Q7lyl-FGZnWLw1mUprSgw6QVfxfVxrv2oeZMBQVr-lqHxcxPV3fwF88V1JI2ClPytqx%7E5AxThJAdt9RYC9IaQeSL4hcQ9-nQr7%7EGYtlhfd8UyM68fRwx4zb5Tw__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:7400:11:f807:5180:93a1, 2600:9000:20c4:4000:11:f807:5180:93a1, 2600:9000:20c4:9e00:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:7400:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149613223 (143M) [binary/octet-stream]
Saving to: ‘enst-drums-dmc.ckpt’

enst-drums-dmc.ckpt 100%[===================>] 142.68M  21.9MB/s    in 7.4s    

2024-08-29 16:38:23 (19.2 MB/s) - ‘enst-drums-dmc.ckpt’ saved [149613223/149613223]

--2024-08-29 16:38:23--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-mixwaveunet.ckpt
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:9000:17:b174:6d00:93a1, 2600:9000:2751:a600:17:b174:6d00:93a1, 2600:9000:2751:2a00:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:9000:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/db99c19bfaca2e83e17d669bb850926a0be567b690f6f63fdb0a7f44202d94a3?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27enst-drums-mixwaveunet.ckpt%3B+filename%3D%22enst-drums-mixwaveunet.ckpt%22%3B&Expires=1725176281&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI4MX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhL2RiOTljMTliZmFjYTJlODNlMTdkNjY5YmI4NTA5MjZhMGJlNTY3YjY5MGY2ZjYzZmRiMGE3ZjQ0MjAyZDk0YTM%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=flnz3phwBaICpBNmt0E5FOpjhu-YLoa6j41z5Z2flL2Kj-X%7EG5ysR8sBZTYCZz3j54XabRzAPhBmf2GHQES8tSN0jDO%7E7P9v0YOMUuKEHZBU-ARNevL%7E8Sdiw74WyaQs6sJD2f%7EjQYaHTbcPfyun1YIA%7E00-kIgzuo98HVGeHbCp5Emou9RpvgrEOg%7EbgCdLUffdPi9KCEQ5K2Q7OWDLW7EU31sh2Q6Ptn4ONv%7EN7XurZi5yJv7aIZwPjHuHcix8y1T7yFFqoJ%7EZPjl43DX8w5KxiKFEQRGBHAiXn7LRQsBUsUR-nERqEPOESueuFAonVJ%7Ei4OcRuPvhjG0h6ZnzbQ__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:23--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/db99c19bfaca2e83e17d669bb850926a0be567b690f6f63fdb0a7f44202d94a3?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27enst-drums-mixwaveunet.ckpt%3B+filename%3D%22enst-drums-mixwaveunet.ckpt%22%3B&Expires=1725176281&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI4MX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhL2RiOTljMTliZmFjYTJlODNlMTdkNjY5YmI4NTA5MjZhMGJlNTY3YjY5MGY2ZjYzZmRiMGE3ZjQ0MjAyZDk0YTM%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=flnz3phwBaICpBNmt0E5FOpjhu-YLoa6j41z5Z2flL2Kj-X%7EG5ysR8sBZTYCZz3j54XabRzAPhBmf2GHQES8tSN0jDO%7E7P9v0YOMUuKEHZBU-ARNevL%7E8Sdiw74WyaQs6sJD2f%7EjQYaHTbcPfyun1YIA%7E00-kIgzuo98HVGeHbCp5Emou9RpvgrEOg%7EbgCdLUffdPi9KCEQ5K2Q7OWDLW7EU31sh2Q6Ptn4ONv%7EN7XurZi5yJv7aIZwPjHuHcix8y1T7yFFqoJ%7EZPjl43DX8w5KxiKFEQRGBHAiXn7LRQsBUsUR-nERqEPOESueuFAonVJ%7Ei4OcRuPvhjG0h6ZnzbQ__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:5600:11:f807:5180:93a1, 2600:9000:20c4:7400:11:f807:5180:93a1, 2600:9000:20c4:4a00:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:5600:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 214227663 (204M) [binary/octet-stream]
Saving to: ‘enst-drums-mixwaveunet.ckpt’

enst-drums-mixwaveu 100%[===================>] 204.30M  25.2MB/s    in 9.2s    

2024-08-29 16:38:33 (22.2 MB/s) - ‘enst-drums-mixwaveunet.ckpt’ saved [214227663/214227663]

--2024-08-29 16:38:34--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/medleydb-16-dmc.ckpt
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:b400:17:b174:6d00:93a1, 2600:9000:2751:4600:17:b174:6d00:93a1, 2600:9000:2751:9000:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:b400:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/91e2e463c59ca0998177274d7bbbf3debbe187e57f5fabf34ea80ee86e72f6a0?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27medleydb-16-dmc.ckpt%3B+filename%3D%22medleydb-16-dmc.ckpt%22%3B&Expires=1725176292&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI5Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzkxZTJlNDYzYzU5Y2EwOTk4MTc3Mjc0ZDdiYmJmM2RlYmJlMTg3ZTU3ZjVmYWJmMzRlYTgwZWU4NmU3MmY2YTA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=cKaCoOKmCJpywrtHngmKhySAQ3Aw-FiI8fCWnC5Qby%7Eqs98iVggLQSMf3acfrCYRl1RnCaQlQm1o3GtOgmvyFBb0EMsLTIweg%7E9EcKgau1ArutFcxII-K3dtkgODQnDIRfkkr%7E4wasrBGi0UZDnMtfeBlWsFaKYUpTpOFKkK9Mjl13Bz9UVdufHNPXoWRsnoMkVj9Qg-o2laI5c28%7EXt%7Ei1n8gIEzu-prgePDtznwDzymUqXQuzYjGI9EYgc1MrzsJLHetVrjW4GnhdUyBh8H2P0eFB5wiK0TX3FA7cUX1olyqt1f3gKDUHJ0CkxwuwpIzdxdxiyUbopWMv2YzuKQA__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:34--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/91e2e463c59ca0998177274d7bbbf3debbe187e57f5fabf34ea80ee86e72f6a0?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27medleydb-16-dmc.ckpt%3B+filename%3D%22medleydb-16-dmc.ckpt%22%3B&Expires=1725176292&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI5Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzkxZTJlNDYzYzU5Y2EwOTk4MTc3Mjc0ZDdiYmJmM2RlYmJlMTg3ZTU3ZjVmYWJmMzRlYTgwZWU4NmU3MmY2YTA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=cKaCoOKmCJpywrtHngmKhySAQ3Aw-FiI8fCWnC5Qby%7Eqs98iVggLQSMf3acfrCYRl1RnCaQlQm1o3GtOgmvyFBb0EMsLTIweg%7E9EcKgau1ArutFcxII-K3dtkgODQnDIRfkkr%7E4wasrBGi0UZDnMtfeBlWsFaKYUpTpOFKkK9Mjl13Bz9UVdufHNPXoWRsnoMkVj9Qg-o2laI5c28%7EXt%7Ei1n8gIEzu-prgePDtznwDzymUqXQuzYjGI9EYgc1MrzsJLHetVrjW4GnhdUyBh8H2P0eFB5wiK0TX3FA7cUX1olyqt1f3gKDUHJ0CkxwuwpIzdxdxiyUbopWMv2YzuKQA__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:5600:11:f807:5180:93a1, 2600:9000:20c4:f600:11:f807:5180:93a1, 2600:9000:20c4:4000:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:5600:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149614695 (143M) [binary/octet-stream]
Saving to: ‘medleydb-16-dmc.ckpt’

medleydb-16-dmc.ckp 100%[===================>] 142.68M  11.4MB/s    in 12s     

2024-08-29 16:38:47 (11.5 MB/s) - ‘medleydb-16-dmc.ckpt’ saved [149614695/149614695]

--2024-08-29 16:38:49--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/drums-test-rock.zip
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:c800:17:b174:6d00:93a1, 2600:9000:2751:a600:17:b174:6d00:93a1, 2600:9000:2751:2a00:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:c800:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/78590471160237edbabf64fc347697793a647ed287bcff367bfa577753e93b70?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27drums-test-rock.zip%3B+filename%3D%22drums-test-rock.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176307&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjMwN319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzc4NTkwNDcxMTYwMjM3ZWRiYWJmNjRmYzM0NzY5Nzc5M2E2NDdlZDI4N2JjZmYzNjdiZmE1Nzc3NTNlOTNiNzA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=UtRoLJEu%7EDdIO3tJHYsxYrdaWfBNEHPvCgdDXrLvnCPO%7Ef3oDCroWFw0LcC1jRerWmOYVPWPmjGa%7EIL%7Eh55udWczFjhVrhavRwRE1HVRxU6ibQTaVYP2wLFDiHnktkek25yeJmMbgB4ibNefbenHCLm4nTXNP53b5hBpt%7EYhsNdOf8v1gOlKFjxP5pK7PrGY0P%7EWw0MjVlWh2e9D1h8ZzFeyOhALRQSycFbcwqn62-DvQ5XVIoKin0uXJpazIy5NUweq%7ERcPns-e1eXhqMINxbT4NHoERVQnbG0FweCpuGmIaufbBihmx%7ExH-KR%7EQd46tKjJ9HlbveV0tBHza3E-sQ__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:49--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/78590471160237edbabf64fc347697793a647ed287bcff367bfa577753e93b70?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27drums-test-rock.zip%3B+filename%3D%22drums-test-rock.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176307&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjMwN319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzc4NTkwNDcxMTYwMjM3ZWRiYWJmNjRmYzM0NzY5Nzc5M2E2NDdlZDI4N2JjZmYzNjdiZmE1Nzc3NTNlOTNiNzA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=UtRoLJEu%7EDdIO3tJHYsxYrdaWfBNEHPvCgdDXrLvnCPO%7Ef3oDCroWFw0LcC1jRerWmOYVPWPmjGa%7EIL%7Eh55udWczFjhVrhavRwRE1HVRxU6ibQTaVYP2wLFDiHnktkek25yeJmMbgB4ibNefbenHCLm4nTXNP53b5hBpt%7EYhsNdOf8v1gOlKFjxP5pK7PrGY0P%7EWw0MjVlWh2e9D1h8ZzFeyOhALRQSycFbcwqn62-DvQ5XVIoKin0uXJpazIy5NUweq%7ERcPns-e1eXhqMINxbT4NHoERVQnbG0FweCpuGmIaufbBihmx%7ExH-KR%7EQd46tKjJ9HlbveV0tBHza3E-sQ__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:4000:11:f807:5180:93a1, 2600:9000:20c4:5600:11:f807:5180:93a1, 2600:9000:20c4:9e00:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:4000:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20044145 (19M) [application/zip]
Saving to: ‘drums-test-rock.zip.7’

drums-test-rock.zip 100%[===================>]  19.12M  10.8MB/s    in 1.8s    

2024-08-29 16:38:52 (10.8 MB/s) - ‘drums-test-rock.zip.7’ saved [20044145/20044145]

Archive:  drums-test-rock.zip
  inflating: __MACOSX/._drums-test-rock  
  inflating: drums-test-rock/.DS_Store  
  inflating: __MACOSX/drums-test-rock/._.DS_Store  
  inflating: drums-test-rock/tracks/04_overhead_L_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/01_kick_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/03_hi-hat_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/02_snare_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/07_tom_2_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/06_tom_1_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/05_overhead_R_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/08_tom_3_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_DMC.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_MixWaveUNet.wav  
--2024-08-29 16:38:52--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/flare-dry-stems.zip
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:c00:17:b174:6d00:93a1, 2600:9000:2751:3a00:17:b174:6d00:93a1, 2600:9000:2751:9e00:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:c00:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/7ff7a103f3d1ed883038465361fb88dc9812f67c2f7a527e78b4ba95cd7053a9?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27flare-dry-stems.zip%3B+filename%3D%22flare-dry-stems.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176311&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjMxMX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzdmZjdhMTAzZjNkMWVkODgzMDM4NDY1MzYxZmI4OGRjOTgxMmY2N2MyZjdhNTI3ZTc4YjRiYTk1Y2Q3MDUzYTk%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=frcuplwXLD8jT3WvxpTvNU-ubZHSlgfDBvtNuaKlIXc4IR%7EfT2Avj5-oNkCtHmhLv-qIU4aGaSH1yiZfN6ziTW6ouVTGjYU1jvGMPUZ5gnQBdnwr6JM3TWTysAAdfR6vY5fpA8sTloWTRio2m-jtZ8SSKdRhcaQRRELcL%7EGbqOK4f%7EZ1N6oqar7i20SFcGX-PYizxM8GOUxVKB4Xggpbv4t4aUH0-3j%7EZt4f-xaT5cu3bwLlYs2BE2qjW3avBdoPVxJZKFZW8DlHAo9zr-wZxtoqlaJzJDUn-vxzSFBh5-bX34L6XC8q66QazwA0Bi8pAEncwnevL%7EFNBTsUnaF%7EAA__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:53--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/7ff7a103f3d1ed883038465361fb88dc9812f67c2f7a527e78b4ba95cd7053a9?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27flare-dry-stems.zip%3B+filename%3D%22flare-dry-stems.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176311&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjMxMX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzdmZjdhMTAzZjNkMWVkODgzMDM4NDY1MzYxZmI4OGRjOTgxMmY2N2MyZjdhNTI3ZTc4YjRiYTk1Y2Q3MDUzYTk%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=frcuplwXLD8jT3WvxpTvNU-ubZHSlgfDBvtNuaKlIXc4IR%7EfT2Avj5-oNkCtHmhLv-qIU4aGaSH1yiZfN6ziTW6ouVTGjYU1jvGMPUZ5gnQBdnwr6JM3TWTysAAdfR6vY5fpA8sTloWTRio2m-jtZ8SSKdRhcaQRRELcL%7EGbqOK4f%7EZ1N6oqar7i20SFcGX-PYizxM8GOUxVKB4Xggpbv4t4aUH0-3j%7EZt4f-xaT5cu3bwLlYs2BE2qjW3avBdoPVxJZKFZW8DlHAo9zr-wZxtoqlaJzJDUn-vxzSFBh5-bX34L6XC8q66QazwA0Bi8pAEncwnevL%7EFNBTsUnaF%7EAA__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:3000:11:f807:5180:93a1, 2600:9000:20c4:8e00:11:f807:5180:93a1, 2600:9000:20c4:5600:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:3000:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 271700657 (259M) [application/zip]
Saving to: ‘flare-dry-stems.zip.3’

flare-dry-stems.zip 100%[===================>] 259.11M  23.6MB/s    in 12s     

2024-08-29 16:39:06 (21.0 MB/s) - ‘flare-dry-stems.zip.3’ saved [271700657/271700657]

Archive:  flare-dry-stems.zip
Written using ZipTricks 5.6.0
 extracting: flare-dry-stems/Flare Bass Stem Dry.wav  
 extracting: flare-dry-stems/Flare Drum Stem Dry.wav  
 extracting: flare-dry-stems/Flare Instrument Stem Dry.wav  
 extracting: flare-dry-stems/Flare Vocal Stem Dry.wav  
!ls
01_inference.ipynb     drums-test-rock.zip.3  DSD100subset.zip.4
02_datasets.ipynb      drums-test-rock.zip.4  DSD100subset.zip.5
03_models.ipynb        drums-test-rock.zip.5  DSD100subset.zip.6
04_training.ipynb      drums-test-rock.zip.6  flare-dry-stems
05_evaluate.ipynb      drums-test-rock.zip.7  flare-dry-stems.zip
checkpoints	       DSD100subset	      flare-dry-stems.zip.1
drums-test-rock        DSD100subset.zip       flare-dry-stems.zip.2
drums-test-rock.zip    DSD100subset.zip.1     flare-dry-stems.zip.3
drums-test-rock.zip.1  DSD100subset.zip.2     lightning_logs
drums-test-rock.zip.2  DSD100subset.zip.3     __MACOSX

Set configuration#

We have the option to select one of two different checkpoints.

If we select enst-drums-dmc.ckpt we can use the pretrained Differentiable mixing console model which will directly predict gain and panning parameters for each track. On the other hand we can also select enst-drums-mixwaveunet.ckpt which will use a multi-input WaveUNet to create a mix of the tracks. To make computation faster we can restrict the maximum number of samples the process with max_samples. Using the default max_samples = 262144 will mix about the first 6 seconds of the track. You can try increasing this value to see how the results change.

Note: In the case of MixWaveUNet, a power of 2 value for max_samples is required.

track_dir = "./drums-test-rock/tracks"
track_ext = "wav"

dmc_ckpt_path = "checkpoints/enst-drums-dmc.ckpt"
mwun_ckpt_path = "checkpoints/enst-drums-mixwaveunet.ckpt"

max_samples = 262144

Load pretrained model#

# load pretrained model
dmc_system = System.load_from_checkpoint(dmc_ckpt_path, pretrained_encoder=False, map_location="cpu").eval()
mwun_system = System.load_from_checkpoint(mwun_ckpt_path, map_location="cpu").eval()
Lightning automatically upgraded your loaded checkpoint from v1.7.2 to v2.3.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint checkpoints/enst-drums-dmc.ckpt`
/home/martinez/Documents/anaconda3/envs/dafx24/lib/python3.9/site-packages/torchaudio/functional/functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (257) may be set too low.
  warnings.warn(
Lightning automatically upgraded your loaded checkpoint from v1.7.2 to v2.3.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint checkpoints/enst-drums-mixwaveunet.ckpt`

Load multitrack#

Now we will read the tracks from disk and create a tensor with all the tracks. In this case, we first peak normalize each track to -12 dB which is what the models expect. In the case of MixWaveUNet, we will add an extra track of silence if less than 8 are provided. However, the DMC model can accept any number of tracks, wether more or less than it was trained with.

We can also create a simple mono mixture of these tracks to hear what the multitrack sounds like before we do any mixing.

# load the input tracks
track_filepaths = glob.glob(os.path.join(track_dir, f"*.{track_ext}"))
track_filepaths = sorted(track_filepaths)
tracks = []
for idx, track_filepath in enumerate(track_filepaths):
    x, sr = torchaudio.load(track_filepath)
    x = x[:, : max_samples]
    x /= x.abs().max().clamp(1e-8) # peak normalize
    x *= 10 ** (-12/20.0) # set peak to -12 dB
    tracks.append(x)

    plt.figure(figsize=(10, 2))
    librosa.display.waveshow(x.view(-1).numpy(), sr=sr, zorder=3)
    plt.title(f"{idx+1} {os.path.basename(track_filepath)}")
    plt.ylim([-1,1])
    plt.grid(c="lightgray")
    plt.show()
    IPython.display.display(ipd.Audio(x.view(-1).numpy(), rate=sr, normalize=True))    

# add dummy tracks of silence if needed
if len(tracks) < 8:
    tracks.append(torch.zeros(x.shape))

# stack tracks into a tensor
tracks = torch.stack(tracks, dim=0)
tracks = tracks.permute(1, 0, 2)
# tracks have shape (1, num_tracks, seq_len)

# listen to the input (mono) before mixing
input_mix = tracks.sum(dim=1, keepdim=True)
print(input_mix.shape)
plt.figure(figsize=(10, 2))
plt.title("Mono Mix")
librosa.display.waveshow(input_mix.view(-1).numpy(), sr=sr, zorder=3, color="tab:orange")
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(input_mix.view(-1).numpy(), rate=sr, normalize=False))
../_images/ca57c788424995fd41bcb22375d5a40c7a850e12f445e8bfac975e2ee14c5800.png
../_images/2a85fab9de03cda51d2b6c8e626a478bfc54f91a945caf79c86b465fe904cb4d.png
../_images/70a3961277dafafa42553489f53025e24bc97c11f1d4f20a813403ed9ba4baea.png
../_images/67d69f2f30e376aa01793c91f9fa2a72ba92e85d694df59a71c2e30d5e0fee6e.png
../_images/bcd096a7c3601769dbb4a804ded2662c1af2ed156b04db559b36442ee2f927ba.png
../_images/a6f3352fab27d4131a99338aabad68d5727f8d3c8c3eee3d0c0cf7044e75b987.png
../_images/73372a290bd9436c235e77277559f07c949e9c9a35f45dd6ce3706b03ed6d71b.png
../_images/30465288fd4481d84701ea374e1833450ebb7f25bf59e237118e8ac2dff78840.png
torch.Size([1, 1, 262144])
../_images/5b84b29295a33f6a3d2d49b627538d87672c0c0bfb53e2052a59df4e969164ab.png

Generate the DMC mix#

Now we can listen to the predicted mix. If we create a mix with the differentiable mixing console we can also print out the gain (in dB) and pan parameter for each track.

# pass tracks to the model and create a mix
with torch.no_grad(): # no need to compute gradients
    mix, params = dmc_system(tracks[:,:-1,:])
print(mix.shape, params.shape)

# view the mix
mix /= mix.abs().max()
plt.figure(figsize=(10, 2))
plt.title("Differentiable Mixing Console")
librosa.display.waveshow(mix.view(2,-1).numpy(), sr=sr, zorder=3)
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(mix.view(2,-1).numpy(), rate=sr, normalize=True))

for track_fp, param in zip(track_filepaths, params.squeeze()):
    print(os.path.basename(track_fp), param)
torch.Size([1, 2, 262144]) torch.Size([1, 7, 2])
../_images/c8d23b80288ed09d53671f41ab06eccb739e0d83740f57c04cfe874885a3c5a2.png
01_kick_066_phrase_rock_complex_fast_sticks.wav tensor([12.3844,  0.5003])
02_snare_066_phrase_rock_complex_fast_sticks.wav tensor([13.0229,  0.5067])
03_hi-hat_066_phrase_rock_complex_fast_sticks.wav tensor([5.0208, 0.5011])
04_overhead_L_066_phrase_rock_complex_fast_sticks.wav tensor([6.4820e+00, 1.4221e-03])
05_overhead_R_066_phrase_rock_complex_fast_sticks.wav tensor([7.4902, 0.9986])
06_tom_1_066_phrase_rock_complex_fast_sticks.wav tensor([-4.6055,  0.7456])
07_tom_2_066_phrase_rock_complex_fast_sticks.wav tensor([1.5387, 0.3615])

Generate the Mix-Wave-U-Net Mix#

If we use the MixWaveUNet there are no parameters to show since this model uses a direct transformation method which does not use intermediate mixing parameters.

with torch.no_grad(): # no need to compute gradients
    mwun_mix, params = mwun_system(tracks)
print(mix.shape, params.shape)

# view the mix
mwun_mix /= mwun_mix.abs().max()
plt.figure(figsize=(10, 2))
plt.title("Mix-Wave-U-Net")
librosa.display.waveshow(mwun_mix.view(2,-1).numpy(), sr=sr, zorder=3)
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(mwun_mix.view(2,-1).numpy(), rate=sr, normalize=True))
torch.Size([1, 2, 262144]) torch.Size([1])
../_images/eb8ef420eb8d17e1b4805a11b4cc5d64a16b2ab7f3ff366be19da51bfcad0494.png

MedleyDB#

Now we will run DMC that was trained on MedleyDB, which includes many types of instruments. This model was trained with all songs that had 16 or less tracks.

dmc_ckpt_path = "checkpoints/medleydb-16-dmc.ckpt"

# load pretrained model
medley_dmc_system = System.load_from_checkpoint(dmc_ckpt_path, pretrained_encoder=False, map_location="cpu").eval()
Lightning automatically upgraded your loaded checkpoint from v1.7.2 to v2.3.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint checkpoints/medleydb-16-dmc.ckpt`
/home/martinez/Documents/anaconda3/envs/dafx24/lib/python3.9/site-packages/torchaudio/functional/functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (257) may be set too low.
  warnings.warn(

Load tracks#

We will use the stems from the song that Gary mixed in the first part of the tutorial.

track_dir = "./flare-dry-stems"
track_ext = "wav"

start_sample = int(32 * 44100)
end_sample = start_sample + int(40 * 44100)

# load the input tracks
track_filepaths = glob.glob(os.path.join(track_dir, f"*.{track_ext}"))
track_filepaths = sorted(track_filepaths)
tracks = []
track_names = []
for idx, track_filepath in enumerate(track_filepaths):
    x, sr = torchaudio.load(track_filepath)

    if "Vocal" in track_filepath or "Bass" in track_filepath:
      x_L = x[0:1, start_sample:end_sample]
      #x_L /= x_L.abs().max().clamp(1e-8) # peak normalize
      #x_L *= 10 ** (-12/20.0) # set peak to -12 dB
      tracks.append(x_L)
      track_names.append(os.path.basename(track_filepath))

    else:
      x_L = x[0:1, start_sample:end_sample]
      x_R = x[1:2, start_sample:end_sample]

      #x_L /= x_L.abs().max().clamp(1e-8) # peak normalize
      #x_L *= 10 ** (-12/20.0) # set peak to -12 dB

      #x_R /= x_R.abs().max().clamp(1e-8) # peak normalize
      #x_R *= 10 ** (-12/20.0) # set peak to -12 dB

      tracks.append(x_L)
      tracks.append(x_R)
      track_names.append(os.path.basename(track_filepath) + "-L")
      track_names.append(os.path.basename(track_filepath) + "-R")

    plt.figure(figsize=(10, 2))
    librosa.display.waveshow(x_L.view(-1).numpy(), sr=sr, zorder=3)
    plt.title(f"{idx+1} {os.path.basename(track_filepath)}")
    plt.ylim([-1,1])
    plt.grid(c="lightgray")
    plt.show()
    IPython.display.display(ipd.Audio(x_L.view(-1).numpy(), rate=sr, normalize=True))    

# stack tracks into a tensor
tracks = torch.stack(tracks, dim=0)
tracks = tracks.permute(1, 0, 2)
# tracks have shape (1, num_tracks, seq_len)

# listen to the input (mono) before mixing
input_mix = tracks.sum(dim=1, keepdim=True).clamp(-1, 1)
plt.figure(figsize=(10, 2))
plt.title("Mono Mix")
librosa.display.waveshow(input_mix.view(-1).numpy(), sr=sr, zorder=3, color="tab:orange")
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(input_mix.view(-1).numpy(), rate=sr, normalize=False))
../_images/9c3e188b66fb933e227789a71ba36f2133c814312ed6f7f22007a3f21b8dcd58.png
../_images/aadfb2162581510347c92b5682e67dddfd4a19bf67f4287e96180f441ce687ab.png
../_images/477fd5a2b6777a0da9e0d1fb95638dc0afbf0580481874018d364ebf40d92d20.png
../_images/0145c27a66d19f7d8d695445a997c5b559d6afbc037ead2b2fd861787f1af4b4.png
../_images/1f28c672cee13c08a0f4fbce7f9feef579bd910f989bf94c4f0263c55f486f39.png

Now we can create a gain and panning mix of these stems.

# pass tracks to the model and create a mix
with torch.no_grad(): # no need to compute gradients
    mix = medley_dmc_system.model.block_based_forward(tracks, 262144, 262144//2)
#print(mix.shape, params.shape)

# view the mix
mix /= mix.abs().max()
plt.figure(figsize=(10, 2))
plt.title("Differentiable Mixing Console")
librosa.display.waveshow(mix.view(2,-1).numpy(), sr=sr, zorder=3)
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(mix.view(2,-1).numpy(), rate=sr, normalize=True))

#for track_fp, param in zip(track_names, params.squeeze()):
#    print(os.path.basename(track_fp), param)
../_images/7cd83fa7cd27988517db7547c50026112fc5936371d4e788dccf13cb76ee603b.png

Certainly not a perfect mix, but notice that the model has learned to raise the level of the vocal, pan it to the center, and try to pan the other elements to the sides.