Abstract
Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being searched. Experimentally these techniques have worked well but are based upon heuristic assumptions. In this paper, a theoretically grounded alternative is proposed, one which uses sense disambiguation based upon context terms within the source text. Specifically this paper introduces the concept of translation probabilities incorporating a context term and extends the IBM Model 1 for estimating context-based translation probabilities from a sentence-aligned bilingual corpus. Experimental results in English to Italian bilingual searches show significant performance improvement of the context-based translation probabilities over the case without any disambiguation.
Original language | English |
---|---|
Pages (from-to) | 481-495 |
Number of pages | 15 |
Journal | Journal of Information Science |
Volume | 35 |
Issue number | 4 |
DOIs | |
Publication status | Published - Aug 2009 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Information Systems
- Library and Information Sciences