next up previous
Next: Scoring Functions Up: SOUND SOURCE SEPARATION OF Previous: Problems with the Existing


New Demixing Approach: DASSS

The current system provides a solution to the problems above. It uses the same histogram approach as the DUET system to estimate the mixing parameters $a_i$ and $\delta_i$, but uses a demixing algorithm that does not suffer from the problems inherent in the nearest neighbor method. The current system uses a technique we term delay and scale subtraction to create $N$ new signals, each of which lacks exactly one source. It then compares these signals to predictions of these signals generated under the assumption that a particular source is present. By scoring the fit of the predictions, the current system makes judgments about the presence of a particular source in a particular point in time-frequency space. To explain the technique, we begin by noting that with reliable estimates for the mixing parameters, it is now possible to create $N$ new signals $Y_i$, each of which entirely eliminates a particular $S_i$. To do this, we choose:
$\displaystyle Y_i$ $\textstyle \equiv$ $\displaystyle X_1 - \ensuremath{\frac{1}{a_i}}e^{+j\omega\delta_i} X_2.$ (2)

We notice that the multiplicative factor applied to $X_2$ corresponds to scaling and delay in the time domain. Hence, we may call this source-eliminating technique delay and scale subtraction scoring or DASSS. We may also write any given $Y_i$ in the form:

\begin{displaymath}
Y_i = \alpha_{i,1}S_1 + \alpha_{i,2}S_2 + \cdots +
\alpha_{i,N}S_N,
\end{displaymath}

where

\begin{displaymath}
\alpha_{u,v} \equiv (1-\ensuremath{\frac{a_v}{a_u}}e^{j\omega(\delta_u -
\delta_v)})
\end{displaymath}

and clearly

\begin{displaymath}
\alpha_{u,u} \equiv (1-\ensuremath{\frac{a_u}{a_u}}e^{j\omega(\delta_u -
\delta_u)}) = 0.
\end{displaymath}

In matrix form, we then have:

\begin{displaymath}
\left[ \begin{array}{cccc} 0 & \alpha_{1,2} & \cdots & \al...
...n{array}{c} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{array}
\right].
\end{displaymath}

If in fact exactly one source, $S_i$, is present at a given frequency bin in a given frame, our model dictates that we will have:
$\displaystyle \hat{Y}_i$ $\textstyle =$ $\displaystyle 0$ (3)
$\displaystyle \hat{Y}_{j\neq i}$ $\textstyle =$ $\displaystyle \alpha_{j,i} S_j$ (4)
  $\textstyle =$ $\displaystyle \alpha_{j,i} X_1.$ (5)

What equation 5 reveals is that if only one source is active, we may predict the $N$ values in the set of $Y_i$ for a given bin in a given frame, using only the known $\alpha$ values and the given mixture $X_1$. In fact, we may make $N$ sets of such predictions, each assuming one guessed active source $g$. (We will use $\hat{Y}_i^g$ to denote the prediction of the $i^{th}$ $Y$ function value when assuming only source $g$ is active. $\hat{Y}_g^g$ will clearly always be zero.) We may then compare each of these $N$ sets of hypotheses to the actual observed set of $Y_i$. If exactly one source is active, only its corresponding prediction will fit the observed $Y_i$.

Subsections
next up previous
Next: Scoring Functions Up: SOUND SOURCE SEPARATION OF Previous: Problems with the Existing
Aaron S. Master 2003-03-27