Lecture: Polish Morphosyntactic Tagging with a LatticeLSTM Architecture
Morphosyntactic tagging is the task of choosing the correct morphosyntactic analysis for a text. The input to a morphosyntactic tagger are all possible analyses of the given text, for example using a rule-based morphological analyzer. Morfeusz 2 (Woliński, 2014), a morphosyntactic analyzer for Polish, splits text into segments and assigns lemmas and morphosyntactic tags to all possible (sub)word segmentations. Each morphosyntactic tag consists of a coarse POS tag and more fine-grained information, such as number, case, person, etc. In many cases, there is more than one morphological analysis for one text: The text can be split into segments in different ways, and one atomic segment can have more than one analysis. Morfeusz 2 captures these ambiguities in a directed acyclic graph.
Each graph node is a pair of input segment and information assigned to that segment. The morphological analyzer makes no statement about the correctness of a particular analysis. The task of the model presented here is identifying the correct analysis. Put differently, the model is trained to identify the correct path through the graph.
In this talk, I describe how I tackled this task using a LatticeLSTM architecture (Sperber et al. 2017). This specialized LSTM architecture computes representations for each graph node that depend on the representations of preceding and succeeding nodes. The talk will focus on the design of the neural architcture, and describe the practical part of implementing the architecture with PyTorch and conducting experiments.
Matthias Sperber, Graham Neubig, Jan Niehues, Alex Waibel. 2017. Neural lattice-to-sequence models for uncertain inputs. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1380–1389, Copenhagen, Denmark. Association for Computational Linguistics.
Marcin Woliński. 2014. Morfeusz reloaded. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), pages 1106-1111, Reykjavik, Iceland. European Languages Resources Association (ELRA).