TY - GEN
T1 - Layout-tree-based approach for identifying visually similar blocks in a web page
AU - Zeng, Jun
AU - Flanagan, Brendan
AU - Hirokawa, Sachio
N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - When extracting information from a web page, IE systems usually need to perform pattern recognition to identify the elements that have similar patterns. However, most of them are mainly based on analyzing HMTL source code, DOM tree, tag tree or Xpath of web pages. These methods are language-dependent, or more precisely, HTML-dependent. They have some insuperable limitations. In order to overcome these limitations, we propose a notion of layout-tree and a pattern recognition method to identify visual blocks with similar visual pattern using layout tree. In this paper, we call a visible rectangular region in a web page a visual block or block for short. We consider if the elements of two blocks are displayed in a similar layout, we define that the two blocks are visually similar. We first transform the layout into a layout tree. By calculating the similarity of the layout trees of two blocks, we can determine whether the two blocks are visually similar or not. The result of experiment shows that the layout tree is an effective method to identify visually similar blocks.
AB - When extracting information from a web page, IE systems usually need to perform pattern recognition to identify the elements that have similar patterns. However, most of them are mainly based on analyzing HMTL source code, DOM tree, tag tree or Xpath of web pages. These methods are language-dependent, or more precisely, HTML-dependent. They have some insuperable limitations. In order to overcome these limitations, we propose a notion of layout-tree and a pattern recognition method to identify visual blocks with similar visual pattern using layout tree. In this paper, we call a visible rectangular region in a web page a visual block or block for short. We consider if the elements of two blocks are displayed in a similar layout, we define that the two blocks are visually similar. We first transform the layout into a layout tree. By calculating the similarity of the layout trees of two blocks, we can determine whether the two blocks are visually similar or not. The result of experiment shows that the layout tree is an effective method to identify visually similar blocks.
UR - http://www.scopus.com/inward/record.url?scp=84886519649&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84886519649&partnerID=8YFLogxK
U2 - 10.1109/ICIS.2013.6607818
DO - 10.1109/ICIS.2013.6607818
M3 - Conference contribution
AN - SCOPUS:84886519649
SN - 9781479901746
T3 - 2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Proceedings
SP - 65
EP - 70
BT - 2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Proceedings
T2 - 2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013
Y2 - 16 June 2013 through 20 June 2013
ER -