Could scene context be beneficial for scene text detection?

Anna Zhu, Renwu Gao, Seiichi Uchida

Research output: Contribution to journalArticlepeer-review

39 Citations (Scopus)


Scene text detection and scene segmentation are meaningful tasks in the computer vision field. Could the semantic scene segmentation assist scene text detection in any degree? For example, can we expect the probability of a region being text is low if its surrounding segment, i.e., its context, is labeled as sky? In this paper, we have a positive answer by constructing a scene context-based text detection model. In this model, we use texton features and a fully-connected conditional random field (CRF) to estimate pixel-level scene classs probability to be considered as images context feature. Meanwhile, maximally stable extremal regions (MSERs) are extracted, integrated and extended as image patches of character candidates. Then, each image patch is fed to a simple two-layer convolutional neural network (CNN) to automatically extract its character feature. The averaged context feature of the corresponding patch is considered as the patchs context feature. The character feature and context feature are fused as the input into a support vector machine for text/non-text determination. Finally, as a post-processing, neighboring text regions are grouped hierarchically. The performance evaluation on ICDAR2013 and SVT databases, as well as a preliminary evaluation on a patch-level database, proves that the scene context can improve the performance of scene text detection. Moreover, the comparative study with state-of-the-art methods shows the top-level performance of our method.

Original languageEnglish
Pages (from-to)204-215
Number of pages12
JournalPattern Recognition
Publication statusPublished - Oct 1 2016

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence


Dive into the research topics of 'Could scene context be beneficial for scene text detection?'. Together they form a unique fingerprint.

Cite this