next up previous contents
Next: Using DUET and DASSS Up: Bayesian Two-Source Models for Previous: Introduction   Contents

DUET and DASSS Review

We presently consider the DUET system [3,4,1] of Scott Rickard et. al. with the following review. The DUET system performs sound source separation of $N$ sources from two channels, where $N$ is in general greater than two. The DUET system assumes the following STFT domain linear mixing model for sources $S_i$ in left channel $X_1$ and right channel $X_2$:

$\displaystyle X_1$ $\textstyle =$ $\displaystyle S_1 + S_2 + \cdots + S_N$ (1)
$\displaystyle X_2$ $\textstyle =$ $\displaystyle a_1 e^{-j\omega\delta_1}S_1 + a_2 e^{-j\omega\delta_2}S_2 + \cdots + a_N e^{-j\omega\delta_N}S_N$ (2)

where $a_i$ represents the scale parameter and $\delta_i$ represents the delay parameter, each from the left to right channel, for some source $i$. We refer to $a_i$ and $\delta_i$ together as the mixing parameters for a given source $i$.

By assuming that only one source at a time is active in time-frequency space - a realistic assumption for independent speech sources - we may estimate the mixing parameters for a particular time-frequency point via:

$\displaystyle (a_i,\delta_i)=
\left(\ensuremath{\frac{\vert X_2(\omega_k,\tau)\...
...{\frac{X_1(\omega_k,\tau)}{X_2(\omega_k,\tau)}}\right)\right\}/\omega_k\right).$     (3)

After collecting many such estimates, the DUET system prepares a two-dimensional histogram whose peaks in $(a_i,\delta_i)$ space should reveal the mixing parameters for each of the $N$ sources. To demix the sources, we consider the set of parameter estimates a second time after the source mixing parameters are estimated from the histogram. We assign each point in time-frequency space to the source whose mixing parameters are closest to that estimated for the time-frequency point. To do this, a variety of matching schemes may be used. We have presented delay and scale subtraction scoring (DASSS) [2], which is similar to a method presented recently by the original DUET authors in [1]. In DASSS, we define a set of functions $Y_i$ such that:

$\displaystyle Y_i$ $\textstyle \equiv$ $\displaystyle X_1 - \ensuremath{\frac{1}{a_i}}e^{+j\omega\delta_i} X_2.$ (4)

If in fact exactly one source, $S_g$, is active at a given frequency bin in a given frame, it may be shown that our model predicts:
$\displaystyle \hat{Y}_{i=g}$ $\textstyle =$ $\displaystyle 0$ (5)
$\displaystyle \hat{Y}_{i\neq g}$ $\textstyle =$ $\displaystyle \alpha_{j,i} S_j$ (6)
  $\textstyle =$ $\displaystyle \alpha_{j,i} X_1.$ (7)

where
$\displaystyle \alpha_{u,v} \equiv (1-\ensuremath{\frac{a_v}{a_u}}e^{j\omega(\delta_u - \delta_v)}).$     (8)

We may then score the hypothesis that source $g$ is active via:

$\displaystyle f(g)$ $\textstyle =$ $\displaystyle \ensuremath{\frac{\sum_{i=1}^N \vert\hat{Y}_i^g - Y_i \vert}{\sum_{i=1}^N \vert Y_i\vert }}$ (9)

where $\hat{Y}_i^g$ denotes the prediction of the $i^{th}$ $Y$ function value when assuming only source $g$ is active. ($\hat{Y}_g^g$ corresponds to $\hat{Y}_{i=g}$ and will clearly always be zero.) We have found that this approach works well when assuming that exactly one source is active. In the next section, we consider what happens in DUET and DASSS when more than one source is active.


next up previous contents
Next: Using DUET and DASSS Up: Bayesian Two-Source Models for Previous: Introduction   Contents
Aaron S. Master 2003-11-01