Abstract
Distributed word representations are usedin many natural language processing tasks.When dealing with ambiguous words, it is desired to generate multi-sense embeddings, i.e.,multiple representations per word. Therefore,several methods have been proposed to generate different word representations based onparts of speech or topic, but these methodstend to be too coarse to deal with ambiguity.In this paper, we propose methods to generatemultiple word representations for each wordbased on dependency structure relations. Inorder to deal with the data sparseness problem due to the increase in the size of vocabulary, the initial value for each word representations is determined using pre-trained wordrepresentations. It is expected that the representations of low frequency words will remainin the vicinity of the initial value, which will inturn reduce the negative effects of data sparseness. Extensive evaluation results confirmthe effectiveness of our methods that significantly outperformed state-of-the-art methodsfor multi-sense embeddings. Detailed analysisof our method shows that the data sparsenessproblem is resolved due to the pre-training.
Original language | English |
---|---|
Pages | 28-36 |
Number of pages | 9 |
Publication status | Published - 2018 |
Event | 32nd Pacific Asia Conference on Language, Information and Computation, PACLIC 2018 - Hong Kong, Hong Kong Duration: Dec 1 2018 → Dec 3 2018 |
Conference
Conference | 32nd Pacific Asia Conference on Language, Information and Computation, PACLIC 2018 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong |
Period | 12/1/18 → 12/3/18 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Computer Science (miscellaneous)