Inference#
In this notebook we will demonstrate how to use two pretrained models to generate multitrack mixes of drum recordings. We provide models trained on the ENST-drums dataset, which features a few hundred drums multitracks and mixes of these multitracks made by professional audio engineers. We train two different multitrack mixing model architectures: the Differentiable Mixing Console (DMC), and the MixWaveUNet. First we will download the model checkpoints and some test audio, then load up the models and the audio tracks and generate a mix that we can listen to.
Note: This notebook assumes that you have already installed the automix
package. If you have not done so, you can run the following:
!pip install git+https://github.com/csteinmetz1/automix-toolkit
import os
import glob
import torch
import torchaudio
import numpy as np
import IPython
import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display
%matplotlib inline
%load_ext autoreload
%autoreload 2
from automix.system import System
Download the pretrained models and multitracks#
First we will download two different pretrained models. Then we will also download a .zip
file containing a drum multitrack and the demo mulitrack that were unseen during training.
# download the pretrained models for DMC and MixWaveUNet trained on ENST-drums dataset
os.makedirs("checkpoints", exist_ok=True)
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-dmc.ckpt
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-mixwaveunet.ckpt
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/medleydb-16-dmc.ckpt
!mv enst-drums-dmc.ckpt checkpoints/enst-drums-dmc.ckpt
!mv enst-drums-mixwaveunet.ckpt checkpoints/enst-drums-mixwaveunet.ckpt
!mv medleydb-16-dmc.ckpt checkpoints/medleydb-16-dmc.ckpt
# then download and extract a drum multitrack from the test set
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/drums-test-rock.zip
!unzip -o drums-test-rock.zip
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/flare-dry-stems.zip
!unzip -o flare-dry-stems.zip -d flare-dry-stems
--2024-08-29 16:38:14-- https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-dmc.ckpt
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:1c00:17:b174:6d00:93a1, 2600:9000:2751:9000:17:b174:6d00:93a1, 2600:9000:2751:a600:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:1c00:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/02988c14c2aeee899dc44488f61c58ca6902e3d815931e6fdd5edda969f70f18?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27enst-drums-dmc.ckpt%3B+filename%3D%22enst-drums-dmc.ckpt%22%3B&Expires=1725176272&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI3Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzAyOTg4YzE0YzJhZWVlODk5ZGM0NDQ4OGY2MWM1OGNhNjkwMmUzZDgxNTkzMWU2ZmRkNWVkZGE5NjlmNzBmMTg%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=i5RkqYTsso8h-MxZ4-EXnkLmaGCIMzcZRRrihPcfwQPVDiIJY8AfbTiRVrxNa9gTxRtt5FHKHdISZi1a6OGk%7Eqa%7EqhwNEpvyc7HeY3I-LY4uJx-WnmVQWeuUvU0onxDUoJd4iRzvVOVTefO-ORmXe%7E65q8-zj16l4mLYCFSkyzMvBfJjTAlmJyRCyiUNpM9CnsSoHns9U5lThG2gjDYQAjE-XoP-Q7lyl-FGZnWLw1mUprSgw6QVfxfVxrv2oeZMBQVr-lqHxcxPV3fwF88V1JI2ClPytqx%7E5AxThJAdt9RYC9IaQeSL4hcQ9-nQr7%7EGYtlhfd8UyM68fRwx4zb5Tw__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:14-- https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/02988c14c2aeee899dc44488f61c58ca6902e3d815931e6fdd5edda969f70f18?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27enst-drums-dmc.ckpt%3B+filename%3D%22enst-drums-dmc.ckpt%22%3B&Expires=1725176272&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI3Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzAyOTg4YzE0YzJhZWVlODk5ZGM0NDQ4OGY2MWM1OGNhNjkwMmUzZDgxNTkzMWU2ZmRkNWVkZGE5NjlmNzBmMTg%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=i5RkqYTsso8h-MxZ4-EXnkLmaGCIMzcZRRrihPcfwQPVDiIJY8AfbTiRVrxNa9gTxRtt5FHKHdISZi1a6OGk%7Eqa%7EqhwNEpvyc7HeY3I-LY4uJx-WnmVQWeuUvU0onxDUoJd4iRzvVOVTefO-ORmXe%7E65q8-zj16l4mLYCFSkyzMvBfJjTAlmJyRCyiUNpM9CnsSoHns9U5lThG2gjDYQAjE-XoP-Q7lyl-FGZnWLw1mUprSgw6QVfxfVxrv2oeZMBQVr-lqHxcxPV3fwF88V1JI2ClPytqx%7E5AxThJAdt9RYC9IaQeSL4hcQ9-nQr7%7EGYtlhfd8UyM68fRwx4zb5Tw__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:7400:11:f807:5180:93a1, 2600:9000:20c4:4000:11:f807:5180:93a1, 2600:9000:20c4:9e00:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:7400:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149613223 (143M) [binary/octet-stream]
Saving to: ‘enst-drums-dmc.ckpt’
enst-drums-dmc.ckpt 100%[===================>] 142.68M 21.9MB/s in 7.4s
2024-08-29 16:38:23 (19.2 MB/s) - ‘enst-drums-dmc.ckpt’ saved [149613223/149613223]
--2024-08-29 16:38:23-- https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/enst-drums-mixwaveunet.ckpt
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:9000:17:b174:6d00:93a1, 2600:9000:2751:a600:17:b174:6d00:93a1, 2600:9000:2751:2a00:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:9000:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/db99c19bfaca2e83e17d669bb850926a0be567b690f6f63fdb0a7f44202d94a3?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27enst-drums-mixwaveunet.ckpt%3B+filename%3D%22enst-drums-mixwaveunet.ckpt%22%3B&Expires=1725176281&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI4MX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhL2RiOTljMTliZmFjYTJlODNlMTdkNjY5YmI4NTA5MjZhMGJlNTY3YjY5MGY2ZjYzZmRiMGE3ZjQ0MjAyZDk0YTM%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=flnz3phwBaICpBNmt0E5FOpjhu-YLoa6j41z5Z2flL2Kj-X%7EG5ysR8sBZTYCZz3j54XabRzAPhBmf2GHQES8tSN0jDO%7E7P9v0YOMUuKEHZBU-ARNevL%7E8Sdiw74WyaQs6sJD2f%7EjQYaHTbcPfyun1YIA%7E00-kIgzuo98HVGeHbCp5Emou9RpvgrEOg%7EbgCdLUffdPi9KCEQ5K2Q7OWDLW7EU31sh2Q6Ptn4ONv%7EN7XurZi5yJv7aIZwPjHuHcix8y1T7yFFqoJ%7EZPjl43DX8w5KxiKFEQRGBHAiXn7LRQsBUsUR-nERqEPOESueuFAonVJ%7Ei4OcRuPvhjG0h6ZnzbQ__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:23-- https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/db99c19bfaca2e83e17d669bb850926a0be567b690f6f63fdb0a7f44202d94a3?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27enst-drums-mixwaveunet.ckpt%3B+filename%3D%22enst-drums-mixwaveunet.ckpt%22%3B&Expires=1725176281&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI4MX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhL2RiOTljMTliZmFjYTJlODNlMTdkNjY5YmI4NTA5MjZhMGJlNTY3YjY5MGY2ZjYzZmRiMGE3ZjQ0MjAyZDk0YTM%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=flnz3phwBaICpBNmt0E5FOpjhu-YLoa6j41z5Z2flL2Kj-X%7EG5ysR8sBZTYCZz3j54XabRzAPhBmf2GHQES8tSN0jDO%7E7P9v0YOMUuKEHZBU-ARNevL%7E8Sdiw74WyaQs6sJD2f%7EjQYaHTbcPfyun1YIA%7E00-kIgzuo98HVGeHbCp5Emou9RpvgrEOg%7EbgCdLUffdPi9KCEQ5K2Q7OWDLW7EU31sh2Q6Ptn4ONv%7EN7XurZi5yJv7aIZwPjHuHcix8y1T7yFFqoJ%7EZPjl43DX8w5KxiKFEQRGBHAiXn7LRQsBUsUR-nERqEPOESueuFAonVJ%7Ei4OcRuPvhjG0h6ZnzbQ__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:5600:11:f807:5180:93a1, 2600:9000:20c4:7400:11:f807:5180:93a1, 2600:9000:20c4:4a00:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:5600:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 214227663 (204M) [binary/octet-stream]
Saving to: ‘enst-drums-mixwaveunet.ckpt’
enst-drums-mixwaveu 100%[===================>] 204.30M 25.2MB/s in 9.2s
2024-08-29 16:38:33 (22.2 MB/s) - ‘enst-drums-mixwaveunet.ckpt’ saved [214227663/214227663]
--2024-08-29 16:38:34-- https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/medleydb-16-dmc.ckpt
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:b400:17:b174:6d00:93a1, 2600:9000:2751:4600:17:b174:6d00:93a1, 2600:9000:2751:9000:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:b400:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/91e2e463c59ca0998177274d7bbbf3debbe187e57f5fabf34ea80ee86e72f6a0?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27medleydb-16-dmc.ckpt%3B+filename%3D%22medleydb-16-dmc.ckpt%22%3B&Expires=1725176292&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI5Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzkxZTJlNDYzYzU5Y2EwOTk4MTc3Mjc0ZDdiYmJmM2RlYmJlMTg3ZTU3ZjVmYWJmMzRlYTgwZWU4NmU3MmY2YTA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=cKaCoOKmCJpywrtHngmKhySAQ3Aw-FiI8fCWnC5Qby%7Eqs98iVggLQSMf3acfrCYRl1RnCaQlQm1o3GtOgmvyFBb0EMsLTIweg%7E9EcKgau1ArutFcxII-K3dtkgODQnDIRfkkr%7E4wasrBGi0UZDnMtfeBlWsFaKYUpTpOFKkK9Mjl13Bz9UVdufHNPXoWRsnoMkVj9Qg-o2laI5c28%7EXt%7Ei1n8gIEzu-prgePDtznwDzymUqXQuzYjGI9EYgc1MrzsJLHetVrjW4GnhdUyBh8H2P0eFB5wiK0TX3FA7cUX1olyqt1f3gKDUHJ0CkxwuwpIzdxdxiyUbopWMv2YzuKQA__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:34-- https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/91e2e463c59ca0998177274d7bbbf3debbe187e57f5fabf34ea80ee86e72f6a0?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27medleydb-16-dmc.ckpt%3B+filename%3D%22medleydb-16-dmc.ckpt%22%3B&Expires=1725176292&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjI5Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzkxZTJlNDYzYzU5Y2EwOTk4MTc3Mjc0ZDdiYmJmM2RlYmJlMTg3ZTU3ZjVmYWJmMzRlYTgwZWU4NmU3MmY2YTA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=cKaCoOKmCJpywrtHngmKhySAQ3Aw-FiI8fCWnC5Qby%7Eqs98iVggLQSMf3acfrCYRl1RnCaQlQm1o3GtOgmvyFBb0EMsLTIweg%7E9EcKgau1ArutFcxII-K3dtkgODQnDIRfkkr%7E4wasrBGi0UZDnMtfeBlWsFaKYUpTpOFKkK9Mjl13Bz9UVdufHNPXoWRsnoMkVj9Qg-o2laI5c28%7EXt%7Ei1n8gIEzu-prgePDtznwDzymUqXQuzYjGI9EYgc1MrzsJLHetVrjW4GnhdUyBh8H2P0eFB5wiK0TX3FA7cUX1olyqt1f3gKDUHJ0CkxwuwpIzdxdxiyUbopWMv2YzuKQA__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:5600:11:f807:5180:93a1, 2600:9000:20c4:f600:11:f807:5180:93a1, 2600:9000:20c4:4000:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:5600:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149614695 (143M) [binary/octet-stream]
Saving to: ‘medleydb-16-dmc.ckpt’
medleydb-16-dmc.ckp 100%[===================>] 142.68M 11.4MB/s in 12s
2024-08-29 16:38:47 (11.5 MB/s) - ‘medleydb-16-dmc.ckpt’ saved [149614695/149614695]
--2024-08-29 16:38:49-- https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/drums-test-rock.zip
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:c800:17:b174:6d00:93a1, 2600:9000:2751:a600:17:b174:6d00:93a1, 2600:9000:2751:2a00:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:c800:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/78590471160237edbabf64fc347697793a647ed287bcff367bfa577753e93b70?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27drums-test-rock.zip%3B+filename%3D%22drums-test-rock.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176307&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjMwN319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzc4NTkwNDcxMTYwMjM3ZWRiYWJmNjRmYzM0NzY5Nzc5M2E2NDdlZDI4N2JjZmYzNjdiZmE1Nzc3NTNlOTNiNzA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=UtRoLJEu%7EDdIO3tJHYsxYrdaWfBNEHPvCgdDXrLvnCPO%7Ef3oDCroWFw0LcC1jRerWmOYVPWPmjGa%7EIL%7Eh55udWczFjhVrhavRwRE1HVRxU6ibQTaVYP2wLFDiHnktkek25yeJmMbgB4ibNefbenHCLm4nTXNP53b5hBpt%7EYhsNdOf8v1gOlKFjxP5pK7PrGY0P%7EWw0MjVlWh2e9D1h8ZzFeyOhALRQSycFbcwqn62-DvQ5XVIoKin0uXJpazIy5NUweq%7ERcPns-e1eXhqMINxbT4NHoERVQnbG0FweCpuGmIaufbBihmx%7ExH-KR%7EQd46tKjJ9HlbveV0tBHza3E-sQ__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:49-- https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/78590471160237edbabf64fc347697793a647ed287bcff367bfa577753e93b70?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27drums-test-rock.zip%3B+filename%3D%22drums-test-rock.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176307&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjMwN319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzc4NTkwNDcxMTYwMjM3ZWRiYWJmNjRmYzM0NzY5Nzc5M2E2NDdlZDI4N2JjZmYzNjdiZmE1Nzc3NTNlOTNiNzA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=UtRoLJEu%7EDdIO3tJHYsxYrdaWfBNEHPvCgdDXrLvnCPO%7Ef3oDCroWFw0LcC1jRerWmOYVPWPmjGa%7EIL%7Eh55udWczFjhVrhavRwRE1HVRxU6ibQTaVYP2wLFDiHnktkek25yeJmMbgB4ibNefbenHCLm4nTXNP53b5hBpt%7EYhsNdOf8v1gOlKFjxP5pK7PrGY0P%7EWw0MjVlWh2e9D1h8ZzFeyOhALRQSycFbcwqn62-DvQ5XVIoKin0uXJpazIy5NUweq%7ERcPns-e1eXhqMINxbT4NHoERVQnbG0FweCpuGmIaufbBihmx%7ExH-KR%7EQd46tKjJ9HlbveV0tBHza3E-sQ__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:4000:11:f807:5180:93a1, 2600:9000:20c4:5600:11:f807:5180:93a1, 2600:9000:20c4:9e00:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:4000:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20044145 (19M) [application/zip]
Saving to: ‘drums-test-rock.zip.7’
drums-test-rock.zip 100%[===================>] 19.12M 10.8MB/s in 1.8s
2024-08-29 16:38:52 (10.8 MB/s) - ‘drums-test-rock.zip.7’ saved [20044145/20044145]
Archive: drums-test-rock.zip
inflating: __MACOSX/._drums-test-rock
inflating: drums-test-rock/.DS_Store
inflating: __MACOSX/drums-test-rock/._.DS_Store
inflating: drums-test-rock/tracks/04_overhead_L_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/tracks/01_kick_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/tracks/03_hi-hat_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/tracks/02_snare_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/tracks/07_tom_2_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/tracks/06_tom_1_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/tracks/05_overhead_R_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/tracks/08_tom_3_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks.wav
inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_DMC.wav
inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_MixWaveUNet.wav
--2024-08-29 16:38:52-- https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/flare-dry-stems.zip
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:c00:17:b174:6d00:93a1, 2600:9000:2751:3a00:17:b174:6d00:93a1, 2600:9000:2751:9e00:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:c00:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/7ff7a103f3d1ed883038465361fb88dc9812f67c2f7a527e78b4ba95cd7053a9?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27flare-dry-stems.zip%3B+filename%3D%22flare-dry-stems.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176311&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjMxMX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzdmZjdhMTAzZjNkMWVkODgzMDM4NDY1MzYxZmI4OGRjOTgxMmY2N2MyZjdhNTI3ZTc4YjRiYTk1Y2Q3MDUzYTk%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=frcuplwXLD8jT3WvxpTvNU-ubZHSlgfDBvtNuaKlIXc4IR%7EfT2Avj5-oNkCtHmhLv-qIU4aGaSH1yiZfN6ziTW6ouVTGjYU1jvGMPUZ5gnQBdnwr6JM3TWTysAAdfR6vY5fpA8sTloWTRio2m-jtZ8SSKdRhcaQRRELcL%7EGbqOK4f%7EZ1N6oqar7i20SFcGX-PYizxM8GOUxVKB4Xggpbv4t4aUH0-3j%7EZt4f-xaT5cu3bwLlYs2BE2qjW3avBdoPVxJZKFZW8DlHAo9zr-wZxtoqlaJzJDUn-vxzSFBh5-bX34L6XC8q66QazwA0Bi8pAEncwnevL%7EFNBTsUnaF%7EAA__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:38:53-- https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/7ff7a103f3d1ed883038465361fb88dc9812f67c2f7a527e78b4ba95cd7053a9?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27flare-dry-stems.zip%3B+filename%3D%22flare-dry-stems.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176311&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjMxMX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzdmZjdhMTAzZjNkMWVkODgzMDM4NDY1MzYxZmI4OGRjOTgxMmY2N2MyZjdhNTI3ZTc4YjRiYTk1Y2Q3MDUzYTk%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=frcuplwXLD8jT3WvxpTvNU-ubZHSlgfDBvtNuaKlIXc4IR%7EfT2Avj5-oNkCtHmhLv-qIU4aGaSH1yiZfN6ziTW6ouVTGjYU1jvGMPUZ5gnQBdnwr6JM3TWTysAAdfR6vY5fpA8sTloWTRio2m-jtZ8SSKdRhcaQRRELcL%7EGbqOK4f%7EZ1N6oqar7i20SFcGX-PYizxM8GOUxVKB4Xggpbv4t4aUH0-3j%7EZt4f-xaT5cu3bwLlYs2BE2qjW3avBdoPVxJZKFZW8DlHAo9zr-wZxtoqlaJzJDUn-vxzSFBh5-bX34L6XC8q66QazwA0Bi8pAEncwnevL%7EFNBTsUnaF%7EAA__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:3000:11:f807:5180:93a1, 2600:9000:20c4:8e00:11:f807:5180:93a1, 2600:9000:20c4:5600:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:3000:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 271700657 (259M) [application/zip]
Saving to: ‘flare-dry-stems.zip.3’
flare-dry-stems.zip 100%[===================>] 259.11M 23.6MB/s in 12s
2024-08-29 16:39:06 (21.0 MB/s) - ‘flare-dry-stems.zip.3’ saved [271700657/271700657]
Archive: flare-dry-stems.zip
Written using ZipTricks 5.6.0
extracting: flare-dry-stems/Flare Bass Stem Dry.wav
extracting: flare-dry-stems/Flare Drum Stem Dry.wav
extracting: flare-dry-stems/Flare Instrument Stem Dry.wav
extracting: flare-dry-stems/Flare Vocal Stem Dry.wav
!ls
01_inference.ipynb drums-test-rock.zip.3 DSD100subset.zip.4
02_datasets.ipynb drums-test-rock.zip.4 DSD100subset.zip.5
03_models.ipynb drums-test-rock.zip.5 DSD100subset.zip.6
04_training.ipynb drums-test-rock.zip.6 flare-dry-stems
05_evaluate.ipynb drums-test-rock.zip.7 flare-dry-stems.zip
checkpoints DSD100subset flare-dry-stems.zip.1
drums-test-rock DSD100subset.zip flare-dry-stems.zip.2
drums-test-rock.zip DSD100subset.zip.1 flare-dry-stems.zip.3
drums-test-rock.zip.1 DSD100subset.zip.2 lightning_logs
drums-test-rock.zip.2 DSD100subset.zip.3 __MACOSX
Set configuration#
We have the option to select one of two different checkpoints.
If we select enst-drums-dmc.ckpt
we can use the pretrained Differentiable mixing console model which will directly predict gain and panning parameters for each track. On the other hand we can also select enst-drums-mixwaveunet.ckpt
which will use a multi-input WaveUNet to create a mix of the tracks. To make computation faster we can restrict the maximum number of samples the process with max_samples
. Using the default max_samples = 262144
will mix about the first 6 seconds of the track. You can try increasing this value to see how the results change.
Note: In the case of MixWaveUNet, a power of 2 value for max_samples
is required.
track_dir = "./drums-test-rock/tracks"
track_ext = "wav"
dmc_ckpt_path = "checkpoints/enst-drums-dmc.ckpt"
mwun_ckpt_path = "checkpoints/enst-drums-mixwaveunet.ckpt"
max_samples = 262144
Load pretrained model#
# load pretrained model
dmc_system = System.load_from_checkpoint(dmc_ckpt_path, pretrained_encoder=False, map_location="cpu").eval()
mwun_system = System.load_from_checkpoint(mwun_ckpt_path, map_location="cpu").eval()
Lightning automatically upgraded your loaded checkpoint from v1.7.2 to v2.3.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint checkpoints/enst-drums-dmc.ckpt`
/home/martinez/Documents/anaconda3/envs/dafx24/lib/python3.9/site-packages/torchaudio/functional/functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (257) may be set too low.
warnings.warn(
Lightning automatically upgraded your loaded checkpoint from v1.7.2 to v2.3.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint checkpoints/enst-drums-mixwaveunet.ckpt`
Load multitrack#
Now we will read the tracks from disk and create a tensor with all the tracks. In this case, we first peak normalize each track to -12 dB which is what the models expect. In the case of MixWaveUNet, we will add an extra track of silence if less than 8 are provided. However, the DMC model can accept any number of tracks, wether more or less than it was trained with.
We can also create a simple mono mixture of these tracks to hear what the multitrack sounds like before we do any mixing.
# load the input tracks
track_filepaths = glob.glob(os.path.join(track_dir, f"*.{track_ext}"))
track_filepaths = sorted(track_filepaths)
tracks = []
for idx, track_filepath in enumerate(track_filepaths):
x, sr = torchaudio.load(track_filepath)
x = x[:, : max_samples]
x /= x.abs().max().clamp(1e-8) # peak normalize
x *= 10 ** (-12/20.0) # set peak to -12 dB
tracks.append(x)
plt.figure(figsize=(10, 2))
librosa.display.waveshow(x.view(-1).numpy(), sr=sr, zorder=3)
plt.title(f"{idx+1} {os.path.basename(track_filepath)}")
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(x.view(-1).numpy(), rate=sr, normalize=True))
# add dummy tracks of silence if needed
if len(tracks) < 8:
tracks.append(torch.zeros(x.shape))
# stack tracks into a tensor
tracks = torch.stack(tracks, dim=0)
tracks = tracks.permute(1, 0, 2)
# tracks have shape (1, num_tracks, seq_len)
# listen to the input (mono) before mixing
input_mix = tracks.sum(dim=1, keepdim=True)
print(input_mix.shape)
plt.figure(figsize=(10, 2))
plt.title("Mono Mix")
librosa.display.waveshow(input_mix.view(-1).numpy(), sr=sr, zorder=3, color="tab:orange")
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(input_mix.view(-1).numpy(), rate=sr, normalize=False))
torch.Size([1, 1, 262144])
Generate the DMC mix#
Now we can listen to the predicted mix. If we create a mix with the differentiable mixing console we can also print out the gain (in dB) and pan parameter for each track.
# pass tracks to the model and create a mix
with torch.no_grad(): # no need to compute gradients
mix, params = dmc_system(tracks[:,:-1,:])
print(mix.shape, params.shape)
# view the mix
mix /= mix.abs().max()
plt.figure(figsize=(10, 2))
plt.title("Differentiable Mixing Console")
librosa.display.waveshow(mix.view(2,-1).numpy(), sr=sr, zorder=3)
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(mix.view(2,-1).numpy(), rate=sr, normalize=True))
for track_fp, param in zip(track_filepaths, params.squeeze()):
print(os.path.basename(track_fp), param)
torch.Size([1, 2, 262144]) torch.Size([1, 7, 2])
01_kick_066_phrase_rock_complex_fast_sticks.wav tensor([12.3844, 0.5003])
02_snare_066_phrase_rock_complex_fast_sticks.wav tensor([13.0229, 0.5067])
03_hi-hat_066_phrase_rock_complex_fast_sticks.wav tensor([5.0208, 0.5011])
04_overhead_L_066_phrase_rock_complex_fast_sticks.wav tensor([6.4820e+00, 1.4221e-03])
05_overhead_R_066_phrase_rock_complex_fast_sticks.wav tensor([7.4902, 0.9986])
06_tom_1_066_phrase_rock_complex_fast_sticks.wav tensor([-4.6055, 0.7456])
07_tom_2_066_phrase_rock_complex_fast_sticks.wav tensor([1.5387, 0.3615])
Generate the Mix-Wave-U-Net Mix#
If we use the MixWaveUNet there are no parameters to show since this model uses a direct transformation method which does not use intermediate mixing parameters.
with torch.no_grad(): # no need to compute gradients
mwun_mix, params = mwun_system(tracks)
print(mix.shape, params.shape)
# view the mix
mwun_mix /= mwun_mix.abs().max()
plt.figure(figsize=(10, 2))
plt.title("Mix-Wave-U-Net")
librosa.display.waveshow(mwun_mix.view(2,-1).numpy(), sr=sr, zorder=3)
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(mwun_mix.view(2,-1).numpy(), rate=sr, normalize=True))
torch.Size([1, 2, 262144]) torch.Size([1])
MedleyDB#
Now we will run DMC that was trained on MedleyDB, which includes many types of instruments. This model was trained with all songs that had 16 or less tracks.
dmc_ckpt_path = "checkpoints/medleydb-16-dmc.ckpt"
# load pretrained model
medley_dmc_system = System.load_from_checkpoint(dmc_ckpt_path, pretrained_encoder=False, map_location="cpu").eval()
Lightning automatically upgraded your loaded checkpoint from v1.7.2 to v2.3.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint checkpoints/medleydb-16-dmc.ckpt`
/home/martinez/Documents/anaconda3/envs/dafx24/lib/python3.9/site-packages/torchaudio/functional/functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (257) may be set too low.
warnings.warn(
Load tracks#
We will use the stems from the song that Gary mixed in the first part of the tutorial.
track_dir = "./flare-dry-stems"
track_ext = "wav"
start_sample = int(32 * 44100)
end_sample = start_sample + int(40 * 44100)
# load the input tracks
track_filepaths = glob.glob(os.path.join(track_dir, f"*.{track_ext}"))
track_filepaths = sorted(track_filepaths)
tracks = []
track_names = []
for idx, track_filepath in enumerate(track_filepaths):
x, sr = torchaudio.load(track_filepath)
if "Vocal" in track_filepath or "Bass" in track_filepath:
x_L = x[0:1, start_sample:end_sample]
#x_L /= x_L.abs().max().clamp(1e-8) # peak normalize
#x_L *= 10 ** (-12/20.0) # set peak to -12 dB
tracks.append(x_L)
track_names.append(os.path.basename(track_filepath))
else:
x_L = x[0:1, start_sample:end_sample]
x_R = x[1:2, start_sample:end_sample]
#x_L /= x_L.abs().max().clamp(1e-8) # peak normalize
#x_L *= 10 ** (-12/20.0) # set peak to -12 dB
#x_R /= x_R.abs().max().clamp(1e-8) # peak normalize
#x_R *= 10 ** (-12/20.0) # set peak to -12 dB
tracks.append(x_L)
tracks.append(x_R)
track_names.append(os.path.basename(track_filepath) + "-L")
track_names.append(os.path.basename(track_filepath) + "-R")
plt.figure(figsize=(10, 2))
librosa.display.waveshow(x_L.view(-1).numpy(), sr=sr, zorder=3)
plt.title(f"{idx+1} {os.path.basename(track_filepath)}")
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(x_L.view(-1).numpy(), rate=sr, normalize=True))
# stack tracks into a tensor
tracks = torch.stack(tracks, dim=0)
tracks = tracks.permute(1, 0, 2)
# tracks have shape (1, num_tracks, seq_len)
# listen to the input (mono) before mixing
input_mix = tracks.sum(dim=1, keepdim=True).clamp(-1, 1)
plt.figure(figsize=(10, 2))
plt.title("Mono Mix")
librosa.display.waveshow(input_mix.view(-1).numpy(), sr=sr, zorder=3, color="tab:orange")
plt.ylim([-1,1])
plt.grid(c="lightgray")
plt.show()
IPython.display.display(ipd.Audio(input_mix.view(-1).numpy(), rate=sr, normalize=False))