Evaluation

Evaluation#

In this notebook we will demonstrate how to evaluate a set of generated mixes via objective metrics.

We will use the mixes generated from the inference notebook, and we will objectively compare those mixes to the human-made groudn truth mixes.

The objective evaluation of mixes can be carried out through audio features that relate to the most common audio effects used during mixing. Since audio effects generally manipulate audio characteristics such as frequency content, dynamics, spatialization, timbre, or pitch, we can use audio features that are associated with these audio characteristics as a way to numerically evaluate mixes.

We can use the following audio features:

-Spectral features for EQ and reverberation: centroid, bandwidth, contrast, flatness, and roll-off

-Spatialisation features for panning: the Panning Root Mean Square (RMS)

-Dynamic features for dynamic range processors: RMS level, dynamic spread and crest factor

-Loudness features: the integrated loudness level (LUFS) and peak loudness

To capture the dynamics of audio effects information we can compute the running mean over a fixed number of past frames. We can calculate the mean absolute percentage error (MAPE) between the target and output features to get a better understanding of the overall relative error.

Note: This notebook assumes that you have already installed the automix package.

!pip install git+https://github.com/csteinmetz1/automix-toolkit

import os
import glob
import torchaudio
import numpy as np

import IPython
import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display

%matplotlib inline
%load_ext autoreload
%autoreload 2

plt.rcParams['figure.figsize'] = [10, 6]
plt.rcParams['figure.dpi'] = 100 

from automix.evaluation.utils_evaluation import get_features

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Drums mixing evaluation#

We will evaluate two different trained models with a test sample from the ENST-drums subset.

Models: the Differentiable Mixing Console (DMC), and the MixWaveUNet.

# then download and extract a drum multitrack from the test set
!wget https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/drums-test-rock.zip
!unzip -o drums-test-rock.zip

mix_target_path = "drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks.wav"
mix_auto_path_wun = "drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_MixWaveUNet.wav"
mix_auto_path_dmc = "drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_DMC.wav"

# Global Settings
SR = 44100
max_samples = 262144
start_sample = 0 * SR
end_sample = start_sample + max_samples

--2024-08-29 16:44:05--  https://huggingface.co/csteinmetz1/automix-toolkit/resolve/main/drums-test-rock.zip
Resolving huggingface.co (huggingface.co)... 2600:9000:2751:be00:17:b174:6d00:93a1, 2600:9000:2751:8a00:17:b174:6d00:93a1, 2600:9000:2751:ec00:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:2751:be00:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/78590471160237edbabf64fc347697793a647ed287bcff367bfa577753e93b70?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27drums-test-rock.zip%3B+filename%3D%22drums-test-rock.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176623&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjYyM319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzc4NTkwNDcxMTYwMjM3ZWRiYWJmNjRmYzM0NzY5Nzc5M2E2NDdlZDI4N2JjZmYzNjdiZmE1Nzc3NTNlOTNiNzA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=lSN%7EUGSWYZEp98909j3LBbj%7EQ5HEpw65GflFZrxai4HP-5d9xpbWp51xjF0vBS6LNmo0m6AepdGyv5qJcFJDwLJgNqmg9fuvvHuCuVHqPNP6XIHswhYaliEHD5r2q5Y9wspRm1tiRNoqfA0lPi2SLXh2M1u8hqeyN-yf%7ERRWUbvfz%7EqMMp0I0xsJKycO%7EBR3pU0XHonnr3DnqORa0bt3IUhVLICajjmIKlOoI4qzMktICBV8Aaz40SxFjPa%7ETs8cg8xxF%7ECSpKAVtvafKJfrJfiGtRT9fBeC3aSqZGoUyOtdOQ91MP%7EqRo7qjojl4gHB9u8rvIiwltHNtDyM8%7EXXfw__&Key-Pair-Id=K3ESJI6DHPFC7 [following]
--2024-08-29 16:44:05--  https://cdn-lfs.huggingface.co/repos/ec/ee/ecee38df047e3f2db1bd8c31a742f3a08f557470cd67cb487402a9c3ed91b5ea/78590471160237edbabf64fc347697793a647ed287bcff367bfa577753e93b70?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27drums-test-rock.zip%3B+filename%3D%22drums-test-rock.zip%22%3B&response-content-type=application%2Fzip&Expires=1725176623&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNTE3NjYyM319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lYy9lZS9lY2VlMzhkZjA0N2UzZjJkYjFiZDhjMzFhNzQyZjNhMDhmNTU3NDcwY2Q2N2NiNDg3NDAyYTljM2VkOTFiNWVhLzc4NTkwNDcxMTYwMjM3ZWRiYWJmNjRmYzM0NzY5Nzc5M2E2NDdlZDI4N2JjZmYzNjdiZmE1Nzc3NTNlOTNiNzA%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=lSN%7EUGSWYZEp98909j3LBbj%7EQ5HEpw65GflFZrxai4HP-5d9xpbWp51xjF0vBS6LNmo0m6AepdGyv5qJcFJDwLJgNqmg9fuvvHuCuVHqPNP6XIHswhYaliEHD5r2q5Y9wspRm1tiRNoqfA0lPi2SLXh2M1u8hqeyN-yf%7ERRWUbvfz%7EqMMp0I0xsJKycO%7EBR3pU0XHonnr3DnqORa0bt3IUhVLICajjmIKlOoI4qzMktICBV8Aaz40SxFjPa%7ETs8cg8xxF%7ECSpKAVtvafKJfrJfiGtRT9fBeC3aSqZGoUyOtdOQ91MP%7EqRo7qjojl4gHB9u8rvIiwltHNtDyM8%7EXXfw__&Key-Pair-Id=K3ESJI6DHPFC7
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 2600:9000:20c4:9400:11:f807:5180:93a1, 2600:9000:20c4:3e00:11:f807:5180:93a1, 2600:9000:20c4:600:11:f807:5180:93a1, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|2600:9000:20c4:9400:11:f807:5180:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20044145 (19M) [application/zip]
Saving to: ‘drums-test-rock.zip.8’

drums-test-rock.zip 100%[===================>]  19.12M  66.8MB/s    in 0.3s    

2024-08-29 16:44:05 (66.8 MB/s) - ‘drums-test-rock.zip.8’ saved [20044145/20044145]

Archive:  drums-test-rock.zip
  inflating: __MACOSX/._drums-test-rock  
  inflating: drums-test-rock/.DS_Store  
  inflating: __MACOSX/drums-test-rock/._.DS_Store  
  inflating: drums-test-rock/tracks/04_overhead_L_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/01_kick_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/03_hi-hat_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/02_snare_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/07_tom_2_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/06_tom_1_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/05_overhead_R_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/tracks/08_tom_3_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_DMC.wav  
  inflating: drums-test-rock/mix/dry_mix_066_phrase_rock_complex_fast_sticks_MixWaveUNet.wav  

Load the mixes#

fig, axs = plt.subplots(2, 1)

target_audio, sr = torchaudio.load(mix_target_path)
target_audio = target_audio[:, start_sample: end_sample]

librosa.display.waveshow(
    target_audio[0,:].numpy(), 
    axis='time', 
    sr=SR, 
    zorder=3, 
    label='human-made', 
    color='k', 
    ax=axs[0])

wun_audio, sr = torchaudio.load(mix_auto_path_wun)
wun_audio = wun_audio[:, start_sample: end_sample]
librosa.display.waveshow(
    wun_audio[0,:].view(-1).numpy(), 
    axis='time', 
    sr=SR, 
    zorder=3, 
    label='MixWaveUNet', 
    color='tab:blue', 
    ax=axs[0], alpha=0.7)
axs[0].grid(c="lightgray")
axs[0].legend()

librosa.display.waveshow(
    target_audio[0,:].numpy(), 
    axis='time', 
    sr=SR, 
    zorder=3, 
    label='human-made', 
    color='k', 
    ax=axs[1])

dmc_audio, sr = torchaudio.load(mix_auto_path_dmc)
dmc_audio = dmc_audio[:, start_sample: end_sample]
librosa.display.waveshow(
    dmc_audio[0,:].view(-1).numpy(),
    axis='time',
    sr=SR,
    zorder=3,
    label='DMC', 
    color='tab:orange', 
    ax=axs[1], 
    alpha=0.7)
axs[1].grid(c="lightgray")
axs[1].legend()

<matplotlib.legend.Legend at 0x7f655391abb0>

../_images/55ea30e9f105810ea4554bab36d53b99faba8125d1090e82983dc7e7a789680a.png

Compute the loudness, spectral, panning and dynamic features#

target_audio = target_audio.numpy()
wun_audio = wun_audio.numpy()
dmc_audio = dmc_audio.numpy()

wun_features = get_features(target_audio, wun_audio)
dmc_features = get_features(target_audio, dmc_audio)

wun_features_mean = {k.split('_')[-1]: wun_features.pop(k) for k in list(wun_features.keys()) if k.startswith('mean_mape')}
dmc_features_mean = {k.split('_')[-1]: dmc_features.pop(k) for k in list(dmc_features.keys()) if k.startswith('mean_mape')}

Plots averages features#

plt.bar(*zip(*wun_features_mean.items()), alpha=0.5, fill=True, color='tab:blue', label='MixWaveUNet')
plt.bar(*zip(*dmc_features_mean.items()), alpha=0.5, fill=True, color='tab:orange', label='DMC')
plt.xticks(rotation=-90)
plt.ylabel('MAPE')
plt.legend()
plt.show()

../_images/705ea89d12f956c5d06b381fbd96164b0cb147819e3bf9e889ea7d31aecbc0ed.png

Plots all features#

plt.bar(*zip(*wun_features.items()), alpha=0.5, fill=True, color='tab:blue', label='MixWaveUNet')
plt.bar(*zip(*dmc_features.items()), alpha=0.5, fill=True, color='tab:orange', label='DMC')
plt.axvline(1.5, 0, 1, linestyle='--', alpha=0.5, color='k', linewidth=0.75)
plt.axvline(6.5, 0, 1, linestyle='--', alpha=0.5, color='k', linewidth=0.75)
plt.axvline(10.5, 0, 1, linestyle='--', alpha=0.5, color='k', linewidth=0.75)
plt.xticks(rotation=-90)
plt.ylabel('MAPE')
plt.legend()
plt.show()

../_images/97b3c8ef0fc8d1f7bab394320810cf9ad9b71f53d63d0112a45cf735f538bed1.png