Part-of-Speech Tagging for Social Media Texts

Authors

M. Neunerdt, B. Trevisan, M. Reyer, R. Mathar,

Abstract

        Work on Part-of-Speech (POS) tagging has mainly concentrated on standardized texts for many years. However, the interest in automatic evaluation of social media texts is growing considerably. As the nature of social media texts is clearly different from standardized texts, Natural Language Processing methods need to be adapted for reliable processing. The basis for such an adaption is a reliably tagged social media text training corpus. In this paper, we introduce a new social media text corpus and evaluate different state-of-the-art POS taggers that are retrained on that corpus. In particular, the applicability of a tagger trained on a specific social media text type to other types, such as chat messages or blog comments, is studied. We show that retraining the taggers on in-domain training data increases the tagging accuracies by more than five percentage points.

BibTEX Reference Entry 

@inproceedings{NeTrReMa13,
	author = {Melanie Neunerdt and Bianka Trevisan and Michael Reyer and Rudolf Mathar},
	title = "Part-of-Speech Tagging for Social Media Texts",
	pages = "139-150",
	booktitle = "International Conference of the German Society for Computational Linguistics and Language Technology (GSCL)",
	address = {Darmstadt, Germany},
	doi = 10.1007/978-3-642-40722-2{\_}15,
	month = Sep,
	year = 2013,
	hsb = hsb999910313881,
	}

Downloads

 Download bibtex-file

Sorry, this paper is currently not available for download.