Bangla Optical Character Recognition
In this research a realistic Bangla OCR has been designed and developed from scratch, which can actually recognize printed Bangla script with a very good accuracy. It recognizes Bangla basic and compound characters, numerals, as well as the frequently used punctuation marks. Current version composed of preprocessing, segmentation and recognition steps. We have used connected component analysis based two zone approach for character segmentation. Convolutional Neural Network (CNN) is used at recognition step. System is trained with seven fonts (Adorsholipi, AponaLohit, Kalpurush, Siyam Rupali, SolaimanLipi, Sutonny, Shonar Bangla). This version works well on good quality documents (standard scan resolution 300 dpi). We tried to fine-tune our system without post processing. In the next version, post processing step will be implemented and also our next target is to make the system more susceptible to poor quality documents. The project is funded by the subproject ‘Development of Multi-Platform Speech & Language Processing Software for Bangla’ (CP-3888) under the 'Higher Education Quality Enhancement Project (HEQEP)' of UGC.