Bangla Text to Speech
This research is aimed on developing a natural sounding Bangla Text to Speech (TTS) synthesis system. The project is funded by the subproject ‘Development of Multi-Platform Speech & Language Processing Software for Bangla’ (CP-3888) under the 'Higher Education Quality Enhancement Project (HEQEP)' of UGC. The running system is a poly-phone concatenation based speech synthesis system. We used poly-phones (combination of di-phones and tri-phones) as the phonetic units for this system. Around 4500 poly-phones were required to develop the complete system. The current system contains the recording of 1 (one) female voice. 2 (two) more voices are being developed. A good TTS system requires a grapheme-to-phoneme (G2P) module to convert the given texts into pronounceable forms. We developed a state-of-the art encoder-decoder based G2P module which was trained on a large lexicon of 135000 words. Both the encoder and decoder networks consisted of 'Gated Recurrent Units Recurrent Neural Networks' (GRU-RNNs). We released our lexicon in a public Github repository for the Bangla research community. The next version of the system will contain 2 (two) more voices that will be generated by a Statistical Parametric Speech Synthesis (SPSS) model. For this purpose, we developed a duration model and an acoustic model, both of which are being trained by 'Bidirectional Long-Short-Term-Memory Recurrent Neural Networks' (BLSTM-RNNs).