SPEAKER: Dr. Dong Yu
Time: 10:00am~11:00am, Mar. 21, 2012
Address: 信电系215会议室
ABSTRACT: Large vocabulary speech recognition (LVSR) has been an active research area for several decades. However, we are still far away from achieving the goal of having machines to understand conversational speech spoken by any speaker under any environment. Recently, great progresses have been made in both acoustic models (AMs) and language models (LMs) and we are able to cut error rate by up to one third on challenging conversational speech recognition tasks. In this talk I will give a short summary of the current state-of-the-art and the trend of the LVSR tasks, illustrate the key techniques that contribute to the recent advances: the deep neural network (DNN) and the recurrent neural network (RNN), and describe how the DNN-hidden-Markov-model (DNN-HMM) hybrid system significantly reduces error rates on both phone recognition and LVSR tasks and how RNN can reduce LM perplexities.
BIO:
Dr. Dong Yu joined Microsoft Corporation in 1998 and the Microsoft speech research group in 2002, where he is a researcher. He holds a Ph.D. degree in computer science from University of Idaho, an MS degree in computer science from Indiana University at Bloomington, an MS degree in electrical engineering from Chinese Academy of Sciences, and a BS degree (with honor) in electrical engineering from Zhejiang University. His current research interests include speech processing, robust speech recognition, discriminative training, spoken dialog system, machine learning, and pattern recognition. He has published more than 90 papers in these areas and is the inventor/coinventor of more than 40 granted/pending patents.
Dr. Dong Yu is a senior member of IEEE, a member of ACM, and a member of ISCA. He is currently serving as an associate editor of IEEE transactions on audio, speech, and language processing (2011-) and has served as an associate editor of IEEE signal processing magazine (2008-2011) and the lead guest editor of IEEE transactions on audio, speech, and language processing - special issue on deep learning for speech and language processing (2010-2011).