We presently consider the DUET
system [3,4,1]
of Scott Rickard et. al. with the following review. The DUET
system performs sound source separation of
sources from two
channels, where
is in general greater than two. The DUET
system assumes the following STFT domain linear mixing model for
sources
in left channel
and right channel
:
By assuming that only one source at a time is active in
time-frequency space - a realistic assumption for independent
speech sources - we may estimate the mixing parameters
for a particular time-frequency point via:
After collecting many such estimates, the DUET system prepares a
two-dimensional histogram whose peaks in
space
should reveal the mixing parameters for each of the
sources.
To demix the sources, we consider the set of parameter estimates a
second time after the source mixing parameters are estimated from
the histogram. We assign each point in time-frequency space to the
source whose mixing parameters are closest to that estimated for
the time-frequency point. To do this, a variety of matching
schemes may be used. We have presented delay and scale
subtraction scoring (DASSS) [2], which is similar to a
method presented recently by the original DUET authors
in [1]. In DASSS, we define a set of functions
such that:
We may then score the hypothesis that source
is active via: