Abstract
Today, the volume of available data on the WWW becomes very huge, and searching information from the WWW is a difficult task for a novice user even if he/she uses the standard search engines. One solution to the problem is to build a user-specific search engine, the database of which includes a large number of web documents required for a user. In this paper, we present a method of building a crawler aiming to search the subset of the WWW related to on-topic pages. We show an effective strategy for leading the crawler to on-topic pages by using naive Bayes text classifier trained by an evaluation of pages gathered by the crawler.
Original language | English |
---|---|
Pages (from-to) | 25-29 |
Number of pages | 5 |
Journal | Research Reports on Information Science and Electrical Engineering of Kyushu University |
Volume | 9 |
Issue number | 1 |
Publication status | Published - Mar 2004 |
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
- Hardware and Architecture
- Engineering (miscellaneous)