Abstract
This paper proposes the use of a formal grammar for the verification of mathematical formulae for a practical mathematical OCR system. Like a C compiler detecting syntax errors in a source file, we want to have a verification mechanism to find errors in the output of mathematical OCR. A linear monadic context-free tree grammar (LM-CFTG) is employed as a formal framework to define "well-formed" mathematical formulae. A cubic time parsing algorithm for LM-CFTGs is presented. For the purpose of practical evaluation, a verification system for mathematical OCR is developed, and the effectiveness of the system is demonstrated by using the ground-truthed mathematical document database InftyCDB-1 and a misrecognition database newly constructed for this study.
Original language | English |
---|---|
Pages (from-to) | 279-298 |
Number of pages | 20 |
Journal | Mathematics in Computer Science |
Volume | 3 |
Issue number | 3 |
DOIs | |
Publication status | Published - May 2010 |
All Science Journal Classification (ASJC) codes
- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics