next up previous
Next: Multi-Source Approaches Up: SOUND SOURCE SEPARATION OF Previous: Advantages Over the DUET


Algorithm

Given the model and scoring above, we now describe an algorithm for demixing the sources, noting practical implementation concerns.
  1. Create two dimensional histogram as in the DUET system to estimate the mixing parameters for each source.
  2. Create $Y_i$ as per equation 2.
  3. For each each frame in the input mixtures:
    1. For each bin, calculate fractional model error $f(g)$ for each potential source $g$.
    2. If the error is sufficiently small for a particular $g$ in a given bin, assign the corresponding bin value of $X_1$ to the left channel of $\hat{S}_g$ and $X_2$ to the right channel of $\hat{S}_g$.
    3. If the error is too large, make note, and consider multi-source demixing algorithms described in section 5 below.
    4. Once all the bins in the frame have been assigned, perform an IFFT on each $\hat{S}_i$ to obtain a set of separated time domain signals.
  4. Overlap and Add the time domain signals produced for each frame to create the separated output signals.
We note that a slight enhancements in the algorithm produce better results. First, we consider the data used in reconstructing the synthesized signals. Specifically, we used $X_1$ and $X_2$ above to produce a separated stereo output signal. In fact, such a signal will tend to be corrupted by signals nearby in the mixing parameter space. This occurs because of a great similarity between the $Y_i$ values corresponding to two such sources. The similarity causes the predicted values $\hat{Y}_i^g$ to be nearly identical and leads to similar values in the scoring function, which in turn allows occasional confusion of the genuine winner. The effect of this is that one output signal is often polluted by artifacts from another signal nearby in the mixing parameter space. To solve the problem, we may simply use the $Y_i$ value corresponding to the interfering source $i$, since $Y_i$ completely lacks source $i$. This yields a mono rather than stereo signal, and a filtered one at that. Nonetheless, we may undo the filtering implicitly applied by each $Y_i$ by dividing by the appropriate $\alpha$ value. On the test example, the results of this method are subjectively superior to the stereo results. Second, we consider the test used to determine if a score is ``good enough'' to assign a bin to a single source. Because of sources' proximity to each other in the parameter space and relative loudness overall, it may be the case that some sources tend to score better than others in general. Thus it is beneficial to analyze the overall data to see the error in cases when a particular source wins and a particular other source finishes in second place. What has been revealed at least for the test case used herein is that drastically different errors tend to occur depending on which model fits the best and on which model finishes second. This data often shows a bimodal distribution of error, suggesting that a ``good enough'' decision threshold should be placed just after the first large clump of data. The threshold used in the test in the algorithm, then, is different depending on which source scores best and which scores second-best.
next up previous
Next: Multi-Source Approaches Up: SOUND SOURCE SEPARATION OF Previous: Advantages Over the DUET
Aaron S. Master 2003-03-27