Lecture: Automating the lexicographic process of finding semantic change in contemporary language
A constant aim for lexicographers is to identify and record changes to the vocabulary of a language. Particularly hard to find are changes in meaning, as they do not necessarily imply a change of form, and lexicographic approaches to the detection of emerging semantically changed lexical units (words, phrases, etc.) are still limited when it comes to the identification of infrequent new meaning. On the one hand currently applied corpus linguistic methods prove to be too dependent on frequency measures, as well as size and composition of the corpora. On the other hand the lexicographic identification process itself is usually not strongly automated, hence only a small number of high-frequency candidates are inspected on the basis of non-randomized, manually drawn samples, causing decisions on changes of meaning to be rather subjective.
In order to overcome these issues we want to integrate recent advances in computational linguistics into the lexicographic process. We will build a fully automatized system that combines controlled human annotation with computational lexical semantic change detection methods in a human-in-the-loop manner to find changing words. The system will predict candidate words on a regular basis from recent language samples (e.g. DeReKo) and present random samples of uses of candidate words to several annotators in a controlled annotation environment. The annotation process is fully automated with an underlying algorithm testing statistical confidence and deciding when to stop the annotation. Once converged, the annotated data will provide a change score and can be inspected visually. In our talk we will show how the system may be used to find new lexemes with a change in meaning between 1990-2010.
Start time: 11:10