2009 – AALL workshop
Increasing the reliability of a part-of-speech tagging tool for use with learner language
Abstract
Since errors in part-of-speech tagging result in larger errors in the analysis of incorrect grammatical or lexical forms, it is essential to encode all components in a text with robust and consistent tags. Following a description of the creation and annotation of a learner language corpus, this presentation will explicate how the tagging accuracy of TreeTagger was increased by means of (a) identifying the lemmas that are unknown for the tagger, (b) checking the part-of-speech tags automatically obtained against an extended set of common-sense rules based on recurrent tagging errors and (c) cross-referencing the part-of-speech tags with the error-encoded tags.
Thouësny, S. (2009). “Increasing the Reliability of a Part-of-speech Tagging Tool for Use with Learner language”. Workshop presentation. Automatic Analysis of Learner Language (AALL’09): From a Better Understanding of Annotation Needs to the Development and Standardization of Annotation Schemes, 10-MAR-09 – 11-MAR-09, Arizona State University, Tempe, AZ, US.
Thouësny, S. (2011). Increasing the reliability of a part-of-speech tagging tool for use with learner language. Proceedings of the pre-conference (AALL’09) workshop on automatic analysis of learner language: from a better understanding of annotation needs to the development and standardization of annotation schemes, 10-MAR-09 – 11-MAR-09, Arizona State University, Tempe, AZ.
Tags: AALL, corpus, learner language, part-of-speech, reliability, tagger, TreeTagger


