Lecture: Efficient Manual Word Sense Clustering on Historical Corpora
Traditionally, word sense annotation in lexical semantics relied on a fixed inventory of senses assigning a single best sense for a given use. Nowadays, studies often take a graded view on word meaning where a use may be assigned to multiple senses on a graded scale or use pairs may be annotated for their semantic proximity and then clustered. While the latter approach avoids the definition of a word sense inventory, and thus by itself gives no information on the quality of a sense cluster (what sense a cluster represents), the approach allows to measure important lexical properties such as polysemy or vagueness.
Use pair annotation is attractive, because it requires no manual preparation except for the sampling of uses from a corpus, while yielding high inter-annotator agreement. We extend this approach to annotate use pairs sampled from different corpora allowing us to measure differences in a word’s corpus-specific sense distributions. Measuring such differences is e.g. important for identification of words undergoing lexical semantic change or word sense differences between language varieties.