Extracting URLs from JavaScript via program analysis

Qi Wang, Jingyu Zhou, Yuting Chen, Yizhou Zhang, Jianjun Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

With the extensive use of client-side JavaScript in web applications, web contents are becoming more dynamic than ever before. This poses significant challenges for search engines, because more web URLs are now embedded or hidden inside JavaScript code and most web crawlers are script-agnostic, significantly reducing the coverage of search engines. We present a hybrid approach that combines static analysis with dynamic execution, overcoming the weakness of a purely static or dynamic approach that either lacks accuracy or suffers from huge execution cost. We also propose to integrate program analysis techniques such as statement coverage and program slicing to improve the performance of URL mining.

Original languageEnglish
Title of host publication2013 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013 - Proceedings
Pages627-630
Number of pages4
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013 - Saint Petersburg, Russian Federation
Duration: Aug 18 2013Aug 26 2013

Publication series

Name2013 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013 - Proceedings

Other

Other2013 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013
Country/TerritoryRussian Federation
CitySaint Petersburg
Period8/18/138/26/13

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Extracting URLs from JavaScript via program analysis'. Together they form a unique fingerprint.

Cite this