When considering the DUET
system [3,4,1]
for estimating
sources from a two channel (stereo) input
signal, we note that the system only works as intended when in
fact the sources are distinct in time-frequency space. This is
referred to as ``source sparsity'' although source independence is
also required. This is because co-occurring sparse sources could
not be separated. In performance of tonal Western music, sources
are in general sparse because instrumental ranges are finite and
most compositions do not require constant playing or singing
throughout time. The sources, however, are not in general
independent, unless the ensemble is without skill or the
music requires that players sound notes in a deliberately random
fashion. The harmonic nature of Western music exacerbates the
problem, because harmonics whose fundamental frequencies are in
(possibly imperfectly) consonant relations will overlap. Even in
the case of dissonant or deliberately random music, pitches are in
general discretized to the 12-tone Western scale, guaranteeing
overlap of some harmonics.
Given these facts, it is necessary that the DUET system be modified if it is to deal with non-independent sources such as those seen in music. Presently, we consider a method for the case when exactly two unknown sources are present. This means that two instruments or voices are sounding though we do not know a priori if it is, for example, the bass and cello or cello and flute. Clearly, this case is only an incremental improvement of the current one-source-at-a-time system. However, in the cases of musical trios or four speaker examples, the two-source assumption is of great benefit.
To consider the benefit in the current approach, we first review the DUET system and the related delay and scale subtraction scoring (DASSS) [2]. We then explore how these models are affected when two sources are present at the same point in time-frequency space. We present this material in a Bayesian context, showing how we interpret the data as distributions. We show the important result that the DUET and DASSS data have particular distributions when two particular sources are present, which may may reveal which two sources are active. To exploit this result, we use the Bayesian framework for determining the most likely sources given the magnitude of DASS scores. We conclude with an example showing the efficacy of using DASS data for determining and demixing two active sources.