Speech Processing Lab
  
Computer Engineering Dep.
Sharif University of Technology



Projects of Speech Processing Laboratory (SPL)

Some of the current projects in the SPL are as follows. Complete information about each project and its development activities will be added later. Also the publications of these projects are available at Publication page.

 
 Speech Recognition: Dictation system
      NEVISA is a Persian dictation system developed as a result of this project. In fact NEVISA is the first output of SPL's recognition engine which uses most popular algorithms and methods in the speech recognition field. This engine use HMM-based modeling and MFCC feature extraction method with some modifications. For more information about this application see ASR-Gooyesh's official web site.  
     
   Robust Speech Recognition
      This project has been started since two years ago and now has developed many of the approaches to noise and speaker robustness. Since our goal is to develop speech recognition related applications operational in real environments, many of these methods have been developed and finalized in the recognition engine. Some of these methods are:
  • On robust features: CMS, PCA, RASTA-PLP, RCC, Liftering
  • On speech enhancement: Spectral Subtraction, Microphone array and beam-forming
  • On model adaptation: MLLR and MAP
  • On model prediction: PMC
  • On speaker normalization: VTLN 
     
   Language modeling and Natural Language Processing
      For any spoken language related systems, linguistic information is the necessary part of that system. For the first time in Persian language, statistical and grammatical language models have been prepared and developed by SPL. Also equivalent researches are performing for English language and our English speech recognition engine. Some of the prepared language models are:
  • N-Grams (N=1,2,3) for Persian and English
  • Word clustering-based N-grams
  • Grammatical rules using GPSG for Persian
  • Probabilistic grammars

More information is available on ASR-Gooyesh.com

     
Telephony Speech Recognition
      NEWSHA is the first speech-enabled Persian computer-telephony system. It is the result of more than three years research in SPL. For finding information about this project and related developed applications please see ASR-Gooyesh.com
     
Go to the top of this page Embedded Speech Recognition
      One of our missions in SPL is to develop an embedded speech recognition engine on low resource computers like smart phones and PDAs. Voice translator and Application launcher are our two primary applications in this area. More details are available on ASR-Gooyesh's web site.
     
  Keyword Spotting
      Keyword spotting, finding specific words in an audio stream, is another   research field in SPL. The first version of this software is now available on ASR-Gooyesh.com for finding the names of 25 countries in a stream.
  Confidence measure and Out Of Vocabulary
      Ranking the recognized word or word sequence is a necessary ability in speech recognition and word spotting systems. In order to have a practical system this ranking and detecting the out of vocabulary words are vital specially in command recognition systems. This projects has been active since two years ago.
     
  Speech Enhancement
      Another field in our research is in Speech Quality Enhancement. Spectral Subtraction and Wiener Filter as two classic methods was experienced and some other approaches like signal sub-space and array processing beam-forming are in progress.
     
Go to the top of this page Voice Activity Detection (VAD)
      For detecting voice signals from non-voice ones, each speech-based system and specially recognition and enhancement systems needs a VAD block. Here we worked on VAD standards, ETSI's AMR and ITU-T's G.722 VAD and also developed two new other ones. These VADs are now incorporated in our recognition engine.
     
  Distance talking and microphone array
      Distance talking and speaker localization are the main topics of this project.
     
  Speech synthesis: Text-To-Speech (TTS)
      SPL also researches on TTS methods and tries to develop practical synthesis system in order to incorporate into other applications like telephony speech-enabled systems.
     
     
  Native and non-native pronunciation ranking
      Ranking the pronunciation of an utterance is one of the major parts of language learning applications. We have used different approaches, specially HMM-based ranking, to achieve this goal.
     
Go to the top of this page Fast likelihood computation
      Likelihood computation is one of the main obstacles to move a speech recognition engine to low-resource computers and devices. Hence various fast Likelihood computation approaches are implemented in order to decrease the computational load in real-time applications.
     
       
     

© 2006 Speech Processing Lab. all rights reserved.
For any question/comment please contact spl [at] ce.sharif.edu