Efficiently mining chemical documents for relevant information implies a number of challenges: The range of possible entities in chemistry is extremely large and heterogeneous, spanning various types of systematic names, trivial and trade names, sum formulas and others. Besides that, properly handling this domain requires to make the link between a substance’s structure and its name(s).
We present a software framework that implements these requirements and that is part of a wider architecture for the efficient analysis of textual information in the Life Sciences comprising the recognition of various types of entities as well as the detection of relations between these. Examples for the application of this software in industry projects are patent analysis and scientific document mining .
An overview of the BioBox Initiative and the Bio-ClusterGrid is the focus of this presentation. The BioBox initiative was started by Sun Asia-Pacific Science and Technology Center to design and develop solutions to problems that researchers and students in the field of bioinformatics are facing today.
The Bio-ClusterGrid is the first of many deployment architectures that realizes the benefit of the BioBox initiative. More than 20 of the most popular bioinformatics applications are made available on the Bio-ClusterGrid through a portal, which greatly enhances application usability. Biologists access the portal through a browser enabled device. The Grid Engine software provides the resource management mechanism to schedule all the bioinformatics applications to run on the cluster of execution servers.