References#

A+15

missing journal in tensorflow2015whitepaper

BKK18

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.

BZSH21a

Dan Barry, Qijian Zhang, Pheobe Wenyi Sun, and Andrew Hines. Go listen: an end-to-end online listening test platform. Journal of Open Research Software, 2021. URL: http://doi.org/10.5334/jors.361.

BZSH21b

Dan Barry, Qijian Zhang, Pheobe Wenyi Sun, and Andrew Hines. Go listen: an end-to-end online listening test platform. Journal of Open Research Software, 2021.

BR17

Adán L Benito and Joshua D Reiss. Intelligent multitrack reverberation based on hinge-loss markov random fields. In Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio. Audio Engineering Society, 2017.

Bil09

Stefan Bilbao. Numerical sound synthesis: finite difference schemes and simulation in musical acoustics. John Wiley and Sons, 2009.

BFH+18

missing journal in jax2018github

BHP17

Jean-Pierre Briot, Gaëtan Hadjeres, and François-David Pachet. Deep learning techniques for music generation–a survey. arXiv:1709.01620, 2017.

BMBF18

Gary Bromham, Dave Moffat, Mathieu Barthet, and György Fazekas. The impact of compressor ballistics on the perceived style of music. In Audio Engineering Society Convention 145. Audio Engineering Society, 2018.

BMR+20

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, and others. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.

BKBF+21

Nick Bryan-Kinns, Berker Banar, Corey Ford, C Reed, Yixiao Zhang, Simon Colton, Jack Armitage, and others. Exploring xai for the arts: explaining latent space in generative music. In 1st Workshop on eXplainable AI approaches for debugging and diagnosis (XAI4Debugging@NeurIPS2021). 2021.

CBS22

Jonah Casebeer, Nicholas J Bryan, and Paris Smaragdis. Meta-af: meta-learning for adaptive filters. arXiv preprint arXiv:2204.11942, 2022.

Che84

Chi-Tsong Chen. Linear system theory and design. Saunders college publishing, 1984.

CKNH20

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607. PMLR, 2020.

CR17

Emmanouil T Chourdakis and Joshua D Reiss. A machine-learning approach to application of intelligent artificial reverberation. Journal of the Audio Engineering Society, 65(1/2):56–65, 2017.

CR16

Emmanouil Theofanis Chourdakis and Joshua D Reiss. Automatic control of a digital reverberation effect using hybrid models. In Audio Engineering Society Conference: 60th International Conference: DREAMS (Dereverberation and Reverberation of Audio, Music, and Speech). Audio Engineering Society, 2016.

CComunitaR22

Joseph T Colonel, Marco Comunità, and Joshua Reiss. Reverse engineering memoryless distortion effects with differentiable waveshapers. In 153rd Convention of the Audio Engineering Society. Audio Engineering Society, 2022.

CR21

Joseph T Colonel and Joshua Reiss. Reverse engineering of a recording mix with differentiable digital signal processing. The Journal of the Acoustical Society of America, 150(1):608–619, 2021.

CSMR22

Joseph T Colonel, Christian J Steinmetz, Marcus Michelen, and Joshua D Reiss. Direct design of biquad filter cascades with deep learning by sampling random polynomials. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3104–3108. IEEE, 2022.

CR+22

Joseph T. Colonel, Joshua D Reiss, and others. Approximating ballistics in a differentiable dynamic range compressor. In 153rd Convention of the Audio Engineering Society. Audio Engineering Society, 2022.

DamskaggJValimaki+19

Eero-Pekka Damskägg, Lauri Juvela, Vesa Välimäki, and others. Real-time modeling of audio distortion circuits with deep learning. In Proc. Int. Sound and Music Computing Conf.(SMC-19), Malaga, Spain, 332–339. 2019.

Dan18

Roger B. Dannenberg. Loudness concepts and panning laws. Introduction to Computer Music, 2018.

DM17

Brecht De Man. Towards a better understanding of mix engineering. PhD thesis, Queen Mary University of London, 2017.

DMR13

Brecht De Man and Joshua D Reiss. A knowledge-engineered autonomous mixing system. In 135th Audio Engineering Society Convention. Audio Engineering Society, 2013.

DMR14

Brecht De Man and Joshua D Reiss. APE: audio perceptual evaluation toolbox for MATLAB. In Audio Engineering Society Convention 136. 2014.

DMRS17

Brecht De Man, Joshua D Reiss, and Ryan Stables. Ten years of automatic mixing. In 3rd AES Workshop on Intelligent Music Production. September 2017.

DV22

Fotios Drakopoulos and Sarah Verhulst. A differentiable optimisation framework for the design of individualised dnn-based hearing-aid strategies. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 351–355. IEEE, 2022.

Dug75

Dan Dugan. Automatic microphone mixing. In 151st Convention of the Audio Engineering Society. Audio Engineering Society, 1975.

DefossezUBB19

Alexandre Défossez, Nicolas Usunier, Léon Bottou, and Francis Bach. Music source separation in the waveform domain. arXiv preprint arXiv:1911.13254, 2019.

EHGR21

Jesse Engel, Lamtharn Hantrakul, Chenjie Gu, and Adam Roberts. DDSP: differentiable digital signal processing. ICLR, 2021.

Far00

Angelo Farina. Simultaneous measurement of impulse response and distortion with a swept-sine technique. In Audio Engineering Society Convention 108. 2000.

Fen18

Steven Fenton. Automatic mixing of multitrack material using modified loudness models. In Audio Engineering Society Convention 145. Audio Engineering Society, 2018.

Gay04

Patrick Gaydecki. Foundations of digital signal processing: theory, algorithms and hardware design. Volume 15. Iet, 2004.

GR07

E Perez Gonzalez and Joshua D Reiss. Automatic mixing: live downmixing stereo panner. In Proceedings of the 7th International Conference on Digital Audio Effects (DAFx’07), 63–68. 2007.

GBC16

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

GPAM+20

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.

G+89

Andreas Griewank and others. On automatic differentiation. Mathematical Programming: recent developments and applications, 6(6):83–107, 1989.

HZRS15

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, 1026–1034. 2015.

HCE+17

Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, and others. Cnn architectures for large-scale audio classification. In ICASSP, 131–135. IEEE, 2017.

HJA20

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.

HKNR+20

Cheng-Zhi Anna Huang, Hendrik Vincent Koops, Ed Newton-Rex, Monica Dinculescu, and Carrie J Cai. Ai song contest: human-ai co-creation in songwriting. arXiv preprint arXiv:2010.05388, 2020.

IR11

Rec ITU-R. Itu-r bs. 1770-2, algorithms to measure audio programme loudness and true-peak audio level. International Telecommunications Union, Geneva, 2011.

IR15

Rec ITU-R. ITU-R BS. 1534-3, method for the subjective assessment of intermediate quality level of audio systems. International Telecommunications Union, Geneva, 2015.

JMM+15

missing journal in jillings2015web

JS22

Nicolas Jonason and Bob L. T. Sturm. TimbreCLIP: connecting timbre to text and images. arXiv:2211.11225, 2022.

KZRS19

Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, and Matthew Sharifi. Fréchet audio distance: a reference-free metric for evaluating music enhancement algorithms. In INTERSPEECH, 2350–2354. 2019.

KW13

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

KMartinezRamirezL+22

Junghyun Koo, Marco A Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, and Yuki Mitsufuji. Music mixing style transfer: a contrastive learning approach to disentangle audio effects. arXiv preprint arXiv:2211.02247, 2022.

KPL22

Junghyun Koo, Seungryeol Paik, and Kyogu Lee. End-to-end music remastering system using self-supervised and adversarial training. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4608–4612. IEEE, 2022.

Kuh58

Walter Kuhl. The acoustical and technological properties of the reverberation plate. EBU Review, Part A-Technical, 49:8–14, 1958.

KPE20

Boris Kuznetsov, Julian D Parker, and Fabián Esqueda. Differentiable iir filters for machine learning applications. In Proc. Int. Conf. Digital Audio Effects (eDAFx-20), 297–303. 2020.

LCL22

Sungho Lee, Hyeong-Seok Choi, and Kyogu Lee. Differentiable artificial reverberation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:2541–2556, 2022.

LBFM21

M Nyssim Lefford, Gary Bromham, György Fazekas, and David Moffat. Context aware intelligent mixing systems. Journal of the Audio Engineering Society, 2021.

LE22

Søren Vøgg Lyster and Cumhur Erkut. A differentiable neural network approach to parameter estimation of reverberation. In 19th Sound and Music Computing Conference, SMC 2022, 358–364. Sound and Music Computing Network, 2022.

MDMP+15

Zheng Ma, Brecht De Man, Pedro DL Pestana, Dawn AA Black, and Joshua D Reiss. Intelligent multitrack dynamic range compression. Journal of the Audio Engineering Society, 63(6):412–426, 2015.

MJZF21

Pranay Manocha, Zeyu Jin, Richard Zhang, and Adam Finkelstein. Cdpam: contrastive learning for perceptual audio similarity. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 196–200. IEEE, 2021.

MFR12

Stuart Mansbridge, Saoirse Finn, and Joshua D Reiss. Implementation and evaluation of autonomous multi-track fader control. In Audio Engineering Society Convention 132. Audio Engineering Society, 2012.

MartinezRamirez20

Marco A Martínez-Ramírez. Deep learning for audio effects modeling. PhD thesis, Queen Mary University of London, 2020.

MartinezRamirezLF+22

Marco A Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Stefan Uhlich, Chihiro Nagashima, and Yuki Mitsufuji. Automatic music mixing with deep learning and out-of-domain data. In ISMIR. 2022.

MartinezRamirezSM21

Marco A Martínez-Ramírez, Daniel Stoller, and David Moffat. A deep learning approach to intelligent drum mixing with the Wave-U-Net. Journal of the Audio Engineering Society, 2021.

MartinezRamirezWSB21

Marco A Martínez-Ramírez, Oliver Wang, Paris Smaragdis, and Nicholas J Bryan. Differentiable signal processing with black-box audio effects. In ICASSP, 66–70. IEEE, 2021.

MS21

Naotake Masuda and Daisuke Saito. Synthesizer sound matching with differentiable dsp. In ISMIR, 428–434. 2021.

MBS20

Stylianos I Mimilakis, Nicholas J Bryan, and Paris Smaragdis. One-shot parametric audio production style transfer with application to frequency equalization. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 256–260. IEEE, 2020.

MS19a

Dave Moffat and Mark Sandler. Machine learning multitrack gain mixing of drums. In 147th Audio Engineering Society Convention. 2019.

MS19b

David Moffat and Mark B Sandler. Approaches in intelligent music production. Arts, 8(5):14, September 2019.

MS19c

David Moffat and Mark B Sandler. Approaches in intelligent music production. In Arts, volume 8, 125. MDPI, 2019.

Ner20

Shahan Nercessian. Neural parametric equalizer matching using differentiable biquads. In Proc. Int. Conf. Digital Audio Effects (eDAFx-20), 265–272. 2020.

PD00

François Pachet and Olivier Delerue. On-the-fly multi-track mixing. In 109th Convention of the Audio Engineering Society. Audio Engineering Society, 2000.

PB09

Julian Parker and Stefan Bilbao. Spring reverberation: a physical perspective. In 12th International Conference on Digital Audio Effects (DAFx-09). 2009.

PGM+19

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, and others. Pytorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems, 2019.

Pee04

Geoffroy Peeters. A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Analysis/Synthesis Team. IRCAM, Paris, France, 54(0):1–25, 2004.

PSDV+18

Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32. 2018.

PGR08

Enrique Perez Gonzalez and Joshua Reiss. Determination and correction of individual channel time offsets for signals involved in an audio mixture. In Audio Engineering Society Convention 125. Audio Engineering Society, 2008.

PGR09

Enrique Perez-Gonzalez and Joshua Reiss. Automatic gain and fader control for live mixing. In 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1–4. IEEE, 2009.

pgr09

enrique perez-gonzalez and joshua reiss. Automatic equalization of multichannel audio using cross-adaptive methods. journal of the audio engineering society, ():, october 2009. doi:.

PR14

Pedro D Pestana and Joshua D Reiss. A cross-adaptive dynamic spectral panning technique. In DAFx, 303–307. Erlangen, 2014.

PRB17

Pedro Duarte Pestana, Joshua D Reiss, and Álvaro Barbosa. User preference on artificial reverberation and delay time parameters. Journal of the Audio Engineering Society, 65(1/2):100–107, 2017.

RKH+21

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, and others. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748–8763. PMLR, 2021.

RamirezR18

Marco A Martínez Ramírez and Joshua D Reiss. End-to-end equalization with convolutional neural networks. In 21st International Conference on Digital Audio Effects (DAFx-18). 2018.

RM14

Joshua D Reiss and Andrew McPherson. Audio effects: theory, implementation and application. CRC Press, 2014.

RM15

Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International conference on machine learning, 1530–1538. PMLR, 2015.

RFB15

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, 234–241. Springer, 2015.

SBStoter+18

Michael Schoeffler, Sarah Bartoschek, Fabian-Robert Stöter, Marlene Roess, Susanne Westphal, Bernd Edler, and Jürgen Herre. Webmushra—a comprehensive framework for web-based listening tests. Journal of Open Research Software, 2018.

SL61

Manfred R Schroeder and Benjamin F Logan. Colorless artificial reverberation. IRE Transactions on Audio, pages 209–214, 1961.

SPSK11

Jeffrey Scott, Matthew Prockup, Erik M Schmidt, and Youngmoo E Kim. Automatic multi-track mixing using linear dynamical systems. In Proceedings of the 8th Sound and Music Computing Conference, Padova, Italy, 12. Citeseer, 2011.

SerraPP21

Joan Serrà, Jordi Pons, and Santiago Pascual. Sesqa: semi-supervised learning for speech quality assessment. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 381–385. IEEE, 2021.

SHC+22

Siyuan Shan, Lamtharn Hantrakul, Jitong Chen, Matt Avent, and David Trevelyan. Differentiable wavetable synthesis. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4598–4602. IEEE, 2022.

SF19

Di Sheng and György Fazekas. A feature learning siamese model for intelligent control of the dynamic range compressor. In 2019 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE, 2019.

Sko16

Esben Skovenborg. Development of semantic scales for music mastering. In Audio Engineering Society Convention 141. 2016.

SDWMG15

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, 2256–2265. PMLR, 2015.

SE19

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 2019.

Spa97

James C Spall. A one-measurement form of simultaneous perturbation stochastic approximation. Automatica, 33(1):109–112, 1997.

SB21

Janne Spijkervet and John Ashley Burgoyne. Contrastive learning of musical representations. arXiv preprint arXiv:2103.09410, 2021.

SRDM19

Ryan Stables, Joshua D. Reiss, and Brecht De Man. Intelligent Music Production. Focal Press, 2019.

SBR22

Christian J Steinmetz, Nicholas J Bryan, and Joshua D Reiss. Style transfer of audio effects with differentiable signal processing. arXiv preprint arXiv:2207.08759, 2022.

SIC21

Christian J Steinmetz, Vamsi Krishna Ithapu, and Paul Calamia. Filtered noise shaping for time domain room impulse response estimation from reverberant speech. In 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 221–225. IEEE, 2021.

SPPSerra21

Christian J Steinmetz, Jordi Pons, Santiago Pascual, and Joan Serrà. Automatic multitrack mixing with a differentiable mixing console of neural audio effects. In ICASSP. IEEE, 2021.

SR20

Christian J Steinmetz and Joshua D Reiss. Auraloss: audio focused loss functions in pytorch. In Digital Music Research Network One-day Workshop. 2020.

SPPS21

Christian J. Steinmetz, Jordi Pons, Santiago Pascual, and Joan Serrà. Automatic multitrack mixing with a differentiable mixing console of neural audio effects. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2021.

SED18

Daniel Stoller, Sebastian Ewert, and Simon Dixon. Wave-u-net: a multi-scale neural network for end-to-end audio source separation. ISMIR, 2018.

StoterULM19

Fabian-Robert Stöter, Stefan Uhlich, Antoine Liutkus, and Yuki Mitsufuji. Open-unmix-a reference implementation for music source separation. Journal of Open Source Software, 4(41):1667, 2019.

TMB21

Zehai Tu, Ning Ma, and Jon Barker. Dhasp: differentiable hearing aid speech processing. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 296–300. IEEE, 2021.

TSK+22

Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W Schuller, Christian J Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, and others. Hear: holistic evaluation of audio representations. In NeurIPS 2021 Competitions and Demonstrations Track, 125–145. PMLR, 2022.

TJM07

George Tzanetakis, Randy Jones, and Kirk McNally. Stereo panning features for classifying recording production style. In ISMIR, 441–444. 2007.

VZolzerA06

Vincent Verfaille, U. Zölzer, and Daniel Arfib. Adaptive digital audio effects (A-DAFx): a new class of sound transformations. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1817–1831, 2006.

ValimakiPS+12

Vesa Välimäki, Julian D Parker, Lauri Savioja, Julius O Smith, and Jonathan S Abel. Fifty years of artificial reverberation. IEEE Transactions on Audio, Speech, and Language Processing, 20(5):1421–1448, 2012.

ValimakiR16

Vesa Välimäki and Joshua D Reiss. All about audio equalization: solutions and frontiers. Applied Sciences, 6(5):129, 2016.

WRA12

Dominic Ward, Joshua D Reiss, and Cham Athwal. Multitrack mixing using a model of loudness and partial loudness. In Audio Engineering Society Convention 133. Audio Engineering Society, 2012.

WWM+17

Dominic Ward, Hagen Wierstorf, Russell Mason, Mark Plumbley, and Christopher Hummersone. Estimating the loudness balance of musical mixtures using audio source separation. In Proceedings of the 3rd Workshop on Intelligent Music Production (WIMP). 2017.

WMMS20

Thomas Wilmering, David Moffat, Alessia Milo, and Mark B Sandler. A history of audio effects. Applied Sciences, 10(3):791, 2020.

WFBS20

Minz Won, Andres Ferraro, Dmitry Bogdanov, and Xavier Serra. Evaluation of cnn-based automatic music tagging models. In Proc. of 17th Sound and Music Computing. 2020.

WValimaki+22

Alec Wright, Vesa Välimäki, and others. Grey-box modelling of dynamic range compression. In Proc. Int. Conf. Digital Audio Effects (DAFX), Vienna, Austria, 304–311. 2022.

WCZ+22

Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. arXiv preprint arXiv:2211.06687, 2022.

YSK20

Ryuichi Yamamoto, Eunwoo Song, and Jae-Min Kim. Parallel wavegan: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6199–6203. IEEE, 2020.

Zolzer11

Udo Zölzer. DAFX: digital audio effects. John Wiley and Sons, 2011.

ZolzerAA+02

Udo Zölzer, Xavier Amatriain, Daniel Arfib, Jordi Bonada, Giovanni De Poli, Pierre Dutilleux, Gianpaolo Evangelista, Florian Keiler, Alex Loscos, Davide Rocchesso, and others. DAFX-Digital audio effects. John Wiley and Sons, 2002.