2009 – AALL workshop

Increasing the reliability of a part-of-speech tagging tool for use with learner language

Abstract

Since errors in part-of-speech tagging result in larger errors in the analysis of incorrect grammatical or lexical forms, it is essential to encode all components in a text with robust and consistent tags. Following a description of the creation and annotation of a learner language corpus, this presentation will explicate how the tagging accuracy of TreeTagger was increased by means of (a) identifying the lemmas that are unknown for the tagger, (b) checking the part-of-speech tags automatically obtained against an extended set of common-sense rules based on recurrent tagging errors and (c) cross-referencing the part-of-speech tags with the error-encoded tags.

Workshop presentation

Thouësny, S. (2009). “Increasing the Reliability of a Part-of-speech Tagging Tool for Use with Learner language”. Workshop presentation. Automatic Analysis of Learner Language (AALL’09): From a Better Understanding of Annotation Needs to the Development and Standardization of Annotation Schemes, 10-MAR-09 – 11-MAR-09, Arizona State University, Tempe, AZ, US.

Publication

Thouësny, S. (2011). Increasing the reliability of a part-of-speech tagging tool for use with learner language. Proceedings of the pre-conference (AALL’09) workshop on automatic analysis of learner language: from a better understanding of annotation needs to the development and standardization of annotation schemes, 10-MAR-09 – 11-MAR-09, Arizona State University, Tempe, AZ.