Nowadays, Text Mining in Life Science is an emerging technology. On the one hand, the potential and the possibilites of automated extraction of information look promising. On the other hand, real world applications still have to be improved and deployed to a larger number of users. From our experience, there exist two main areas where Text Mining comes into play. First, mining on a large scale underlying a general purpose platform. Second, mining with a very narrow focus providing custom tailored solutions to small business units. In the latter context, information extraction is only a means for a service (e.g. alerting system).
The talk will provide an overview of current applications of Text Mining in our company. The positive impacts for scientific work will be adressed as well as the shortcomings where potential improvements are outlined.
Challenge evaluations permit both developers and users to track the progress of a maturing technology. The area of biomedicine presents a rich environment because the users (biologists, bioinformaticians, and medical researchers) already rely heavily on information technology for literature access (e.g., MEDLINE) and for indexing and navigation through standardized nomenclatures and ontologies (e.g., MeSH, GO). To design appropriate challenge evaluations, it is important to understand the larger workflow in which semantic text mining applications are embedded. For example, a semantic mining application could extract information from free text for downstream automated processing, e.g., assigning GO features for analysis of high-throughput experimental data; or it could provide an interactive tool suite to support curation of a biological database such as SWISS-Prot or FlyBase. Recently, several groups have been exploring applications that produce human readable output, such as a summary of the literature relevant to a specific user query.
This talk will review existing challenge evaluations to date, including the highly successful CASP challenge for protein structure prediction on the bioinformatics side, and the TREC Genomics track and BioCreAtIvE for text mining. These evaluations have spanned a range from weak semantic mining (e.g., biological named entity tagging) to strong semantic mining, as in automated assignment of Gene Ontology concepts. We will then discuss directions for future evaluations, including the need to encourage research on interactive tools that can be quickly tailored to users' specific requirements, as well as on human-oriented output such as summaries.
Bio-ontologies provide a means to represent and integrate biological data such that the information is accessible to humans and computable for machines. Recent advances in the development of bio-ontologies for molecular and phenotypic systems support standard data descriptions for genetic, genomic and phenotypic data, facilitate data integration and exchange among bioinformatics and resources providers, and enhance the ability of scientists to analyze large data sets and to utilize comparative genomic and phenotypic information in their research. My work focuses on the development and implementation of the Gene Ontology and the use of GO and other bio-ontologies in the Mouse Genome Informatics system. I will describe the annotation, integration, and visualization of biological information for the laboratory mouse. I focus particularly on the development and use of the Gene Ontology and the extension of that system and others to facilitate the representation and recovery of information about mouse models of human diseases.