Lookahead Part-Of-Speech Tagger
Overview
This is a C++ implementation of the part-of-speech (POS) tagging algorithm described in [1]. The tagger is fast (>500 sentences/sec), accurate (97.22% on the WSJ corpus), and trainable with your own POS-annotated corpus. The tagger contains model files trained for English.
How to use the tagger
1. Download the latest version of the tagger
2. Expand the archive
> tar xvzf lapos-X.X.tar.gz
3. Compile
> cd lapos-X.X/
> make
4. Tag sentences
Prepare the input in one-sentence-per-line format, then run the "lapos" command:
> echo "He opened the window." | ./lapos -t -m ./model_wsj02-21
He/PRP opened/VBD the/DT window/NN ./.
How to build a tagging model with your own annotated corpus
Please see the README file.
References
[1] Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Kazama. 2011. Learning with Lookahead: Can History-Based Models Rival Globally Optimized Models? In Proceedings of CoNLL, pp. 238-246.
This page is maintained by Yoshimasa Tsuruoka