A joint training procedure to learn a generic adaptation NN from the whole training set as well as many small speaker codes, one of which is estimated for each speaker only using data from that particular speaker.
Training parameters: the original NN weights, the adaptation weights and the training speakers codes.
Training methods: standard back-propagation with the cross entropy objective function. Adaptation weights and speaker codes are randomly initialized.
Testing: supervised adaptation, only the speaker code is learnt using back-propagation.
Experiments: 40D Mel FBanks + energy and 1st and 2nd temporal derivatives. Global CMVN normalization. Bigram LM. 15 frames input window.
Testing is conducted for each speaker based on a cross validation method. In each run, n utterances for a specific speaker are used for supervised adaptation and the remaining 8-n are used for test. Totally 8 runs per speaker. The overall averaged performance is reported.
Adaptation NN has two 1000D sigmoid hidden layers and a linear output layer. 50D speaker code.
dummy: no speaker codes; 0: speaker codes are all 0s; oracle: same data for adaptation and testing.