LingPipe is a suite of Java libraries for the linguistic analysis of human language.
LingPipe's information extraction and data mining tools:
* track mentions of entities (e.g. people or proteins);
* link entity mentions to database entries;
* uncover relations between entities and actions;
* classify text passages by language, character encoding, genre, topic, or sentiment;
* correct spelling with respect to a text collection;
* cluster documents by implicit topic and discover significant trends over time; and
* provide part-of-speech tagging and phrase chunking.
LingPipe's architecture is designed to be efficient, scalable, reusable, and robust. Highlights include:
* Java API with source code and unit tests;
* multi-lingual, multi-domain, multi-genre models;
* training with new data for new tasks;
* n-best output with statistical confidence estimates;
* online training (learn-a-little, tag-a-little);
* thread-safe models and decoders for concurrent-read exclusive-write (CREW) synchronization; and
* character encoding-sensitive I/O.