Detecting Irregularities in Blog Comment Language Affecting POS Tagging Accuracy
Authors
Abstract
Studying technology acceptance requires the survey and analysis of user opinions to identify acceptance relevant factors. In addition to surveys, Web 2.0 poses a huge collection of user comments regarding different technologies. Extracting acceptance-relevant factors and user opinions from these comments requires the application of Natural Language Processing (NLP) methods, particularly POS tagging. Due to the language used in blogs, POS tagging results suffer from high error rates. In this paper, we present a user-specific study of blog comments to analyze the relation between blog language and performance of NLP methods. Application of the proposed approach leads to enhancement of POS tagging and lemmatizing quality. Furthermore, we present an ontology-based corpus generation tool to improve the identification of topic- and user-specific blog comments. Utilizing the generation tool, exemplarily a corpus dealing with mobile communication systems (MCS) is created. Furthermore, we analyze and transform the identified comments into structured datasets.
BibTEX Reference Entry
@article{NeTrMaJa12, author = {Melanie Neunerdt and Bianka Trevisan and Rudolf Mathar and Eva-Maria Jakobs}, title = "Detecting Irregularities in Blog Comment Language Affecting POS Tagging Accuracy", pages = "71-88", journal = "International Journal of Computational Linguistics and Applications", volume = "3", number = "1", month = Jun, year = 2012, hsb = hsb999910272430 , }
Downloads
Download paper Download bibtex-file
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights there in are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.