The authors proposed a new missing data recognition approach, in which reduced marginalization intervals are computed for each possible mask. The set of all possible masks and intervals is obtained by clustering on a clean and noisy stereo training corpus. The main principle of the proposed approach consists in training accurate marginalization intervals that are as small as possible, in order to improve the precision of marginalization.
The spectral ratio X/Y between clean and noisy speech is computed on a stereo training corpus. This results in a time-frequency representation that provides for every noisy spectral feature the relative contribution of the clean speech energy. This ratio is related to the local SNR as follows:
The feature domain, which is also the marginalization domain of missing data, is the 12-bands Mel spectral domain with cube-root compression of the speech power. Temporal derivatives are further added, leading to a 24 dimensional feature vector.