This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Indic OCR" – news · newspapers · books · scholar · JSTOR(February 2022) (Learn how and when to remove this message)
Indic OCR refers to the process of converting text images written in Indic scripts into e-text using Optical character recognition (OCR) techniques. Broadly, it can also refer to the OCR systems of Brahmic scripts for languages of South Asia and Southeast Asia, not just the scripts of the Indian subcontinent, which are all written in an abugida-based writing system.
OCR for Latin characters is still not 100% accurate but a relatively high degree of accuracy in conversion has been able to be achieved. Such accuracy has not yet been able to be achieved for Indic scripts using OCR. This is due in part to the writing systems of Indic languages as well as a lack of standard representation, encoding, and support among operating systems and keyboards.
The Centre for Development of Advanced Computing (C-DAC) and Technology Development for Indian Languages, the premier R&D organisation of the Ministry of Electronics and Information Technology (also known as MeitY) of India have carried out many projects relating to OCR. Their projects include OCR for Malayalam, Odia, Punjabi, Telugu and Devanagari script.
IndicOCR refers to the process of converting text images written in Indic scripts into e-text using Optical character recognition (OCR) techniques. Broadly...
commercial and open source OCR systems are available for most common writing systems, including Latin, Cyrillic, Arabic, Hebrew, Indic, Bengali (Bangla), Devanagari...
and Text to Speech applications and OCR in Indian languages. Unicode standard version 15.0 specifies codes for 9 Indic scripts in Chapter 12 titled "South...
"water, ocean" or abdhi "ocean."[citation needed] The term varies across Indic languages, referred to as Khir Shaagor in Bengali, Tiruppāṟkaṭal in Tamil...
from Latin, Greek, Cyrillic and Indic scripts. Breuel, Thomas (9 April 2007). "Announcing the OCRopus Open Source OCR System". Google Developers Blog...
keyboard and handwriting, and transliterate other scripts into Bharati. An OCR system for the script has also been developed. It is yet to be added to Unicode...
Scharf; M Hyman (2009). V Govindaraju and S Setlur (ed.). Guide to OCR for Indic Scripts: Document Recognition and Retrieval. Springer. p. 238. ISBN 978-1-84800-330-9...
The limitation with the method is that it only gives researchers the raw OCR data to "combine and collapse frequencies of correctly and incorrectly recognised...
November 2020. Venu Govindaraju; Srirangaraj Setlur (2009). Guide to OCR for Indic Scripts: Document Recognition and Retrieval – Advances in Pattern Recognition...