Some of the current
projects in the SPL are
as follows. Complete
information about each
project and its
development
activities will be added
later. Also the
publications of these
projects are available
at
Publication page.
|
|
 |
Speech
Recognition:
Dictation system |
| |
|
|
NEVISA is
a
Persian
dictation system
developed as a result of
this project. In
fact NEVISA is
the first output
of SPL's
recognition
engine which uses
most popular
algorithms and
methods in the
speech
recognition
field. This
engine use HMM-based
modeling and MFCC
feature
extraction
method with
some
modifications.
For more
information
about this
application see
ASR-Gooyesh's
official web
site. |
| |
|
|
| |
 |
Robust
Speech
Recognition |
| |
|
|
This project has
been started
since two years
ago and now has
developed many
of the
approaches to
noise and
speaker
robustness. Since
our goal is to
develop
speech recognition
related applications
operational in
real
environments,
many of these
methods have been
developed and
finalized in
the recognition
engine. Some of
these methods
are:
- On robust
features:
CMS, PCA,
RASTA-PLP,
RCC,
Liftering
-
On speech enhancement:
Spectral
Subtraction,
Microphone
array and
beam-forming
- On model
adaptation:
MLLR and MAP
- On model
prediction:
PMC
- On speaker
normalization:
VTLN
|
| |
|
|
| |
 |
Language
modeling and
Natural Language
Processing |
| |
|
|
For any spoken
language related
systems,
linguistic
information is
the necessary part
of that system. For the first time
in Persian
language, statistical and
grammatical
language models
have been
prepared and
developed
by SPL. Also
equivalent researches
are performing
for English
language and
our English
speech
recognition
engine. Some of
the prepared
language models
are:
- N-Grams
(N=1,2,3)
for Persian
and English
- Word clustering-based N-grams
-
Grammatical
rules using
GPSG for
Persian
-
Probabilistic
grammars
More
information is
available on
ASR-Gooyesh.com. |
| |
|
|
|
|
 |
Telephony
Speech
Recognition |
| |
|
|
NEWSHA is
the first
speech-enabled
Persian
computer-telephony
system. It is
the result of
more than three
years research
in SPL.
For finding
information
about this
project and
related
developed
applications
please see
ASR-Gooyesh.com. |
| |
|
|
 |
 |
Embedded
Speech
Recognition |
| |
|
|
One of our
missions in SPL
is to develop an
embedded speech
recognition
engine on low
resource
computers like
smart phones and PDAs.
Voice translator
and Application
launcher are our
two primary
applications
in this
area. More
details are
available on
ASR-Gooyesh's
web site. |
| |
|
|
| |
 |
Keyword
Spotting |
| |
|
|
Keyword
spotting,
finding specific
words in an audio
stream, is another
research field
in SPL. The
first version of
this software is
now available on
ASR-Gooyesh.com
for finding the
names of 25
countries in a
stream. |
|
|
|
| |
 |
Confidence
measure and Out
Of Vocabulary |
| |
|
|
Ranking the
recognized word
or word sequence
is a
necessary
ability in speech
recognition and
word spotting
systems. In
order to have a
practical system
this ranking and
detecting the
out of
vocabulary words
are vital
specially in
command
recognition
systems. This
projects has
been active since
two years ago. |
| |
|
|
| |
 |
Speech
Enhancement |
| |
|
|
Another field in
our research is in
Speech Quality
Enhancement.
Spectral
Subtraction and
Wiener Filter as
two classic
methods was
experienced and
some other
approaches like
signal sub-space
and array
processing
beam-forming are
in progress. |
| |
|
|
 |
 |
Voice
Activity
Detection (VAD) |
| |
|
|
For detecting
voice signals
from non-voice
ones, each
speech-based
system and
specially
recognition and
enhancement
systems needs a VAD
block. Here we
worked on VAD
standards,
ETSI's AMR and
ITU-T's G.722
VAD and
also developed two
new other ones.
These VADs
are now incorporated
in our
recognition
engine. |
| |
|
|
| |
 |
Distance
talking and
microphone array
|
| |
|
|
Distance talking
and speaker
localization are
the main topics
of this project. |
| |
|
|
| |
 |
Speech
synthesis:
Text-To-Speech
(TTS) |
| |
|
|
SPL also
researches on TTS
methods and
tries to develop
practical
synthesis system
in order to
incorporate
into other
applications
like telephony
speech-enabled
systems. |
| |
|
|
| |
|
|
| |
 |
Native and
non-native
pronunciation
ranking |
| |
|
|
Ranking the
pronunciation of an utterance
is one of the
major parts of
language
learning
applications. We have
used different
approaches,
specially
HMM-based
ranking, to achieve this goal.
|
| |
|
|
 |
 |
Fast
likelihood
computation |
| |
|
|
Likelihood computation is one of the
main obstacles to move a speech recognition engine
to low-resource computers and devices.
Hence various fast Likelihood computation approaches are
implemented in order to decrease the
computational load in real-time applications. |
| |
|
|
| |
|
|
|
| |
|
|