Next: Multi-Source Approaches
Up: SOUND SOURCE SEPARATION OF
Previous: Advantages Over the DUET
Algorithm
Given the model and scoring above, we now describe an algorithm
for demixing the sources, noting practical implementation
concerns.
- Create two dimensional histogram as in the DUET system
to estimate the mixing parameters for each source.
- Create
as per equation 2.
- For each each frame in the input mixtures:
- For each bin, calculate fractional model error
for each potential source
.
- If the error is sufficiently small for a particular
in a given bin,
assign the corresponding bin value of
to the left channel of
and
to
the right channel of
.
- If the error is too large, make note, and consider
multi-source demixing algorithms described in
section 5 below.
- Once all the bins in the frame have been assigned,
perform an IFFT on each
to obtain a set of separated time domain
signals.
- Overlap and Add the time domain signals produced for
each frame to create the separated output signals.
We note that a slight enhancements in the algorithm produce better
results. First, we consider the data used in reconstructing the
synthesized signals. Specifically, we used
and
above
to produce a separated stereo output signal. In fact, such a
signal will tend to be corrupted by signals nearby in the mixing
parameter space. This occurs because of a great similarity
between the
values corresponding to two such sources. The
similarity causes the predicted values
to be nearly
identical and leads to similar values in the scoring function,
which in turn allows occasional confusion of the genuine winner.
The effect of this is that one output signal is often polluted by
artifacts from another signal nearby in the mixing parameter
space. To solve the problem, we may simply use the
value
corresponding to the interfering source
, since
completely lacks source
. This yields a mono rather than
stereo signal, and a filtered one at that. Nonetheless, we may
undo the filtering implicitly applied by each
by dividing by
the appropriate
value. On the test example, the results
of this method are subjectively superior to the stereo results.
Second, we consider the test used to determine if a score is
``good enough'' to assign a bin to a single source. Because of
sources' proximity to each other in the parameter space and
relative loudness overall, it may be the case that some sources
tend to score better than others in general. Thus it is
beneficial to analyze the overall data to see the error in cases
when a particular source wins and a particular other source
finishes in second place. What has been revealed at least for the
test case used herein is that drastically different errors tend to
occur depending on which model fits the best and on which model
finishes second. This data often shows a bimodal distribution of
error, suggesting that a ``good enough'' decision threshold should
be placed just after the first large clump of data. The threshold
used in the test in the algorithm, then, is different depending on
which source scores best and which scores second-best.
Next: Multi-Source Approaches
Up: SOUND SOURCE SEPARATION OF
Previous: Advantages Over the DUET
Aaron S. Master
2003-03-27