Tutorials
Dietrich Rebholz-Schuhmann (EBI) & Jin-Dong Kim (U Tokyo): “
Text Mining
”
(Tutorial I)
-
Introduction: A large number of terms is available from biomedical resources such as Uniprot/Swiss- Prot and gene ontology (GO). The introduction will give details, what resources are suitable to text mining.
-
Overview on the principles of text analysis: How are documents structured, what types of words are contained, how is the sentence structured, what is part of speech information?
-
Annotated corpora using the example Genia: Corpora have been developed in the biomedical domain to show the usage of terminology and language in the scientific literature. Genia is the most important corpus and is an excellent example of continuous effort to build resources that improve text mining and information extraction methods. The tutorial will explain the structure of Genia, the ongoing resource building effort and the benefits resulting from the use of the corpus in the text mining field, e.g. training of TM methods.
-
Principles of information retrieval (IR): indexing of documents, different types of indexes, profile vectores of documents (fingerprints), similarity search in the vector space, biomedical examples for these approaches (Collexis; Ralf Zimmer, LMU).
-
Methods in information extraction (IE): What are the methods for information extraction? What is the benefit. Named entity recognition (NER) serves the identification of protein and gene names amongst others. What techniques have to be applied to deal with existing terminologies and nomenclatures. What is the benefit of more advanced information extraction methods such as chunk parsing (shallow parsing) and full parsing.
-
Key principles for the evaluation of IR and IE methods: precision, recall, accuracy, F-measure
-
Selected IR or IE approaches in the biomedical field: Curation tools, extraction of mutations, gene disease associations, EBI s tools and services (EBIMed, Whatizit and others). Scripted access to available services.
-
Assessment of text mining against curation efforts and database content. What types of facts can we expect from the literature?
-
Conclusion, outlook, comments and questions
Alexander Morgan (MITRE) & Martin Krallinger (CNB/CNIO): “
Evaluation of Text Mining Systems
"
(Tutorial II)
This tutorial will focus on methods of designing, preparing, and performing the evaluation of automated systems doing information extraction on biomedical text. We will discuss issues with task design, the involvement of domain experts (often biological database curators) in the process, the distribution of data, system scoring, and what can be learned from a community shared task. We will pay particular attention to the process of evaluating the mapping of text excerpts to ordered representations of biological knowledge: unique identifier lists, controlled vocabularies and ontologies. Examples will be drawn from BioCreAtIvE (Critical Assessment of Information Extraction in Biology) 2004 and the new BioCreAtIvE, which is in the planning stages
Robert Stevens (U Manchester):
“
Ontologies in the Life Sciences
”
(Tutorial III)
Ontologies are playing an increasingly wide role in the life sciences. Since the advent of the Gene Ontology, more and more controlled vocabularies have been developed. The main use of these has been to mark up biological data to cover functionality, experimental design, phenotype, and much more. Ontologies, however, have other uses apart from a common vocabulary that assists retrieval of data across multiple resources. These have included their use in text-mining, analysis of experimental results, the storage of data, and general interpretation of those data.
In this tutorial, attendees will learn about: The motivation for the use of ontologies; a brod review of existing ontologies; their use in describing and using data in the life sciences. The tutorial will finish with a review of the current issues and future prospects for bio-ontologies.
Steffen Staab (U Koblenz):
"Ontologies and the Semantic Web"
(Tutorial IV)
Knowledge rich domains benefit if knowledge structures are made explicit and formal in order that they may be used by people as well as by machines. In recent years there has been intensive research towards representing and using two kinds of knowledge structures in particular. First, ontologies have been investigated as a means to formalize a conceptualization of a domain of interest, i.e. an ontology captures the terminology of a domain as it remains constant over all the different situations one may encounter for a domain. Second, the semantic web has been conceived as an idea to provide a world wide standard to represent data as well as ontologies - and to link such data and ontologies. While the standardization allows for easy exchange and reuse of encoded knowledge structures, the linkage of data and ontologies allows for occurrence of even large network effects by the community that exploits them.
In the tutorial we will approach the foundations for both ontologies and the semantic web and we will see some way of exploiting them in knowledge rich domains.
The structure of the tutorial is as follows:
What is an ontology? (20 min)
What is the Semantic Web? (20 min)
Representation language for ontologies and the semantic web (50min)
Semantic Integration (30 min)
Ontology learning from text (30 min)
Some uses of ontologies in text representation tasks (30 min)
Slides:
Stefan Schuster (JCB):
"Bioinformatics I: Computer Simulation of Metabolic Networks
- Lecture"
(Tutorial V)
-cancelled-
Stefan Schuster (JCB):
"
Bioinformatics II:
Computer Simulation of Metabolic Networks
-
Practical Course"
(Tutorial VI)
-cancelled-