Schnelleinstieg Reader


Startseite FSU

UIMA type system


A new type system version (2.6.8) is available, see Downloads - (2010-02-17)


UIMA is an architecture and framework for the development of UIM applications (e.g. Natural Language Processing). The Java-based UIMA SDK enables the creation and aggregation of single NLP tools (called Analysis Engines (AEs)) into pipelines (aggregate AEs). The AEs provide to the object of analysis (e.g. text) the meta-information (annotation) in form of typed objects with valued properties. The Common Analysis System (CAS) in UIMA manages the organization, access and storage of all typed objects associated with the subject of analysis. The types of the candidate objects are defined in a hierachically organized annotation type system.

We here provide an annotation type system (JULIE Lab type system) which covers various levels of text analysis (e.g. document, linguistic and semantic analysis). The JULIE Lab type system consists of five layers: Document Meta, Document Structure & Style, Morpho-Syntax, Syntax and Semantics. The Document Meta layer describes the bibliographical and content information about a complete document. The Document Structure & Style layer contains information about the organization and layout of the analyzed documents. The Morpho-Syntax layer represents the results of the morpho-syntactic analysis such as tokenisation, stemming, and part-of-speech tagging. The annotations from shallow and full parsing are represented at the Syntax layer. The appropriate types permit the representation of dependency- and constituency-based parsing results. The Semantics layer comprises currently the representation of (named) entities, particularly for the bio-medical domain and newswire (MUC) domain, and will soon be extended with the representation of relationships between entities and events.

Documentation and Download


  • Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou. An UIMA Annotation Type System for a Generic Text Mining Architecture. UIMA-Workshop, GLDV Conference, April 2007.
  • Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, John McNaught, Yoshimasa Tsuruoka, Sophia Ananiadou. An Annotation Type System for a Data-Driven NLP Pipeline. The Linguistic Annotation Workshop (LAW) of ACL 2007 to be held in Prague, Czech Republic, June 28-29, 2007.