• Image

    Bangla Optical Character Recognition

    In this research a realistic Bangla OCR has been designed and developed from scratch, which can actually recognize printed Bangla script with a very good accuracy. It recognizes Bangla basic and compound characters, numerals, as well as the frequently used punctuation marks. Current version composed of preprocessing, segmentation and recognition steps. We have used connected component analysis based two zone approach for character segmentation. Convolutional Neural Network (CNN) is used at recognition step. System is trained with seven fonts (Adorsholipi, AponaLohit, Kalpurush, Siyam Rupali, SolaimanLipi, Sutonny, Shonar Bangla). This version works well on good quality documents (standard scan resolution 300 dpi). We tried to fine-tune our system without post processing. In the next version, post processing step will be implemented and also our next target is to make the system more susceptible to poor quality documents. The project is funded by the subproject ‘Development of Multi-Platform Speech & Language Processing Software for Bangla’ (CP-3888) under the 'Higher Education Quality Enhancement Project (HEQEP)' of UGC.

List of Project Contributors

Name Email
Image
Tasnim Zahan Tithy PhD Researcher
[email protected]
Image
Dr Md Zafar Iqbal Project Supervisor
[email protected]
Image
Dr Mohammad Reza Selim Project Co-supervisor
[email protected]
Copyright © 2019 Sust Bangla Research
Developed by Technext Logo Technext Limited