Key points are not available for this paper at this time.
We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis (2005). The idea is to extract certain characteristic spectra-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear combinations of these learned bases. In other words, their model extracts a compact code of basis functions that can explain the space spanned by spectral vectors of a speaker. In our model, we generate a sparse-distributed code where we have more basis functions than the dimensionality of the space. We propose a probabilistic framework to achieve sparsity. Experiments show that the resulting sparse code better captures the structure in data and hence leads to better separation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Madhusudana Shashanka
Boston University
Bhiksha Raj
Carnegie Mellon University
Paris Smaragdis
Moscow Institute of Thermal Technology
Boston University
Mitsubishi Electric (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Shashanka et al. (Sun,) studied this question.
synapsesocial.com/papers/6a20849d78c6e96e5b3e8de9 — DOI: https://doi.org/10.1109/icassp.2007.366317
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: