Machine learningSource separation and demixing
Vocal Separation
Vocal separation is the task of isolating the singing voice from a mixed music recording, leaving the instrumental accompaniment. Introduced formally by Han et al. (2012), it is critical for music editing, remixing, karaoke generation, and music analysis. Modern deep learning approaches (Défossez et al., 2021) have achieved impressive quality, enabling practical applications in music production and streaming services. Vocal separation is a special case of source separation, where the goal is to isolate the most perceptually salient source.
Open in MethodMindSoonVideoSoon
Read the full method
Members only
Sign inSign in with a free account to read this section.
Sources
- Han, Y., Qin, Z., & Kang, Z. (2012). Singing voice separation using spectral floor filtered spectrograms. In Proceedings of the International Society for Music Information Retrieval Conference. link ↗
- Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE Transactions on Audio, Speech, and Language Processing, 23(12), 2136-2147. DOI: 10.1109/TASL.2015.2468582 ↗
- Défossez, A., Usunier, N., Bottou, L., & Bach, F. (2021). Music source separation in the waveform domain. In International Conference on Learning Representations. link ↗