Machine learningSource separation and demixing

Vocal Separation

Vocal separation is the task of isolating the singing voice from a mixed music recording, leaving the instrumental accompaniment. Introduced formally by Han et al. (2012), it is critical for music editing, remixing, karaoke generation, and music analysis. Modern deep learning approaches (Défossez et al., 2021) have achieved impressive quality, enabling practical applications in music production and streaming services. Vocal separation is a special case of source separation, where the goal is to isolate the most perceptually salient source.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Han, Y., Qin, Z., & Kang, Z. (2012). Singing voice separation using spectral floor filtered spectrograms. In Proceedings of the International Society for Music Information Retrieval Conference. link
  2. Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE Transactions on Audio, Speech, and Language Processing, 23(12), 2136-2147. DOI: 10.1109/TASL.2015.2468582
  3. Défossez, A., Usunier, N., Bottou, L., & Bach, F. (2021). Music source separation in the waveform domain. In International Conference on Learning Representations. link

Related methods

Referenced by

ScholarGateVocal Separation (Vocal Separation and Source Separation Algorithm). Retrieved 2026-06-04 from https://scholargate.app/en/music-information-retrieval/vocal-separation