Speaker Diarization Boosts Automatic Speaker RecognitionIn Audio Recordings
Sunday, October 20, 2013 - 10:40
in Mathematics & Economics
An important goal in spoken-language-systems research is speaker diarization - computationally determining how many speakers feature in a recording and which of them speaks when. To date, the best diarization systems have used supervised machine learning; they're trained on sample recordings that a human has indexed, indicating which speaker enters when. In a new paper, MIT researchers show how they can improve speaker diarization so that it can automatically annotate audio or video recordings without supervision: No prior indexing is necessary. They also discuss, compact way to represent the differences between individual speakers' voices, which could be of use in other spoken-language computational tasks. read more