LingPipe

LingPipe是一个自然语言处理的Java开源工具包。 LingPipe目前已有很丰富的功能，包括主题分类（Top Classification）、命名实体识别（Named Entity Recognition）、词性标注（Part-of Speech Tagging）、句题检测（Sentence Detection）、查询拼写检查（Query Spell Checking）、兴趣短语检测（Interseting Phrase Detection）、聚类（Clustering）、字符语言建模（Character Language Modeling）、医学文献下载/解析/索引（MEDLINE Download, Parsing and Indexing）、数据库文本挖掘（Database Text Mining）、中文分词（Chinese Word Segmentation）、情感分析（Sentiment Analysis）、语言辨别（Language Identification）等API。

LingPipe is a suite of Java libraries for the linguistic analysis of human language.

Feature Overview

LingPipe's information extraction and data mining tools:

    * track mentions of entities (e.g. people or proteins);
    * link entity mentions to database entries;
    * uncover relations between entities and actions;
    * classify text passages by language, character encoding, genre, topic, or sentiment;
    * correct spelling with respect to a text collection;
    * cluster documents by implicit topic and discover significant trends over time; and
    * provide part-of-speech tagging and phrase chunking.

Architecture

LingPipe's architecture is designed to be efficient, scalable, reusable, and robust. Highlights include:

    * Java API with source code and unit tests;
    * multi-lingual, multi-domain, multi-genre models;
    * training with new data for new tasks;
    * n-best output with statistical confidence estimates;
    * online training (learn-a-little, tag-a-little);
    * thread-safe models and decoders for concurrent-read exclusive-write (CREW) synchronization; and
    * character encoding-sensitive I/O.

【官方主页】

【下载地址】

网友留言/评论

我要留言/评论

相关开源项目

JUnit PDF Report:这个项目能够从JUnit测试结果生成PDF报告。它使用Apache Ant来执行生成，Apache FOP来描绘PDF文档。

JBookShelf:JBookShelf是一个电子文档管理工具，提供收集整理、查看和搜索等功能。支持的文档类包括：文本、HTML、RTF和PDF。

jEdit - Plugin Central:jEdit Plugin Central是JEdit文本编辑器最主要的在线插件库。可以结合该插件库，通过jEdit的插件管理器来安装，更新和删除插件。所有插件列表

X-Smiles:X-Smiles是一个基于Java的XML浏览器。它能够在桌面系统与嵌入式网络设备中使用。并支持多媒体服务。

GWTChismes:GWTChismes是一组GWT控件包括：日期选择、对话框、按纽、进度条、标签面板（GWTCIntervalSelector、GWTCDatePicker、GWTCButton、GWTCBox、GWTCGlassPanel、GWTCPupupBox、GWTCModalBox、GWTCAlert、GWTCWait、GWTCPrint、GWTCProgress）。

migrate4j:migrate4j是一个数据库迁移的工具，类似于Ruby的db:migrate。语法简单，易于配置。它能够保持详细的数据库架构演变历史（可以很方便回滚到以前的版本）。此外，migrate4j能够减少或消除不同数据库必须使用的特定语法，方便不同数据库之间的切换。

eHour:eHour是一个开源基于Web的多用户，多角色，多项目的时间跟踪系统。

TinyMCE:TinyMCE是一个平台独立基于Web的Javascript HTML WYSIWYG编辑器.采用JavaScript/ECMAScript开发,它的主要特性包括主题/模板支持,多语言支持(包括简体中文),支持通过插件的方式进行扩展.这个编辑器可工作在Mozilla, Firefox和MSIE上.

JNA:JNA（Java Native Access ）提供一组Java工具类用于在运行期动态访问系统本地库（native library：如Window的dll）而不需要编写任何Native/JNI代码。开发人员只要在一个java接口中描述目标native library的函数与结构，JNA将自动实现Java接口到native function的映射。

dpHibernate:dpHibernate是Flex3/BlazeDS Hibernate Adapter用于支持从Flex应用程序中懒加hibernate对象。