A new type system version (2.6.8) is available, see Downloads - (2010-02-17)


UIMA is an architecture and framework for the development of UIM applications (e.g. Natural Language Processing). The Java-based UIMA SDK enables the creation and aggregation of single NLP tools (called Analysis Engines (AEs)) into pipelines (aggregate AEs). The AEs provide to the object of analysis (e.g. text) the meta-information (annotation) in form of typed objects with valued properties. The Common Analysis System (CAS) in UIMA manages the organization, access and storage of all typed objects associated with the subject of analysis. The types of the candidate objects are defined in a hierachically organized annotation type system.

We here provide an annotation type system (JULIE Lab type system) which covers various levels of text analysis (e.g. document, linguistic and semantic analysis). The JULIE Lab type system consists of five layers: Document Meta, Document Structure & Style, Morpho-Syntax, Syntax and Semantics. The Document Meta layer describes the bibliographical and content information about a complete document. The Document Structure & Style layer contains information about the organization and layout of the analyzed documents. The Morpho-Syntax layer represents the results of the morpho-syntactic analysis such as tokenisation, stemming, and part-of-speech tagging. The annotations from shallow and full parsing are represented at the Syntax layer. The appropriate types permit the representation of dependency- and constituency-based parsing results. The Semantics layer comprises currently the representation of (named) entities, particularly for the bio-medical domain and newswire (MUC) domain, and will soon be extended with the representation of relationships between entities and events.

Documentation and Download


