Skip to main content

IBM Delivers Advanced Search and Business Insight Framework to Open Source Community

UIMA Provides Foundation for New Search-Related Applications That Extract Hidden Meaning From Unstructured Information

ARMONK, NY - 23 Jan 2006: IBM today announced it has made new open source technology available that will enhance knowledge discovery capabilities across multiple industries and applications and provide developers with tools to support a new breed of software for the analysis of information. The company has completed the first step of making the Unstructured Information Management Architecture (UIMA) available to the open source community by publishing the UIMA source code to SourceForge.net, the world's largest open source development site.

UIMA is an open software framework already in use by industry and academia to collaborate on the creation, development and deployment of technologies for discovering the vital knowledge present in the fastest growing sources of information today -- unstructured content in the enterprise and across the Web, including documents, images, comment and note fields, e-mail and even rich media like video and audio. New technologies built using UIMA will help unlock the value in organizations' content assets. Later this year, IBM intends to move this project to a full open source community development model.

"Companies want to get value from all of their information, but no single vendor can address all of the search, text analytics and business insight needs across all types of information and for all industries," said Nelson Mattos, vice president Information and Interaction, IBM Research. "We are making UIMA available to the open source community to encourage innovation and allow analytics software tools from multiple sources to work together and build upon each other."

Since unveiling UIMA in December of 2004, an active ecosystem of partners, customers and open source developers have accelerated innovation and solution delivery around UIMA.

The International Federation of Pharmaceutical Manufacturers & Associations, the worldwide industry body that represents pharmaceutical companies, is launching in February a portal of clinical trial information that uses the UIMA framework as part of IBM WebSphere Information Integrator OmniFind Edition to enable searching by disease area, medicine name or trial location, recognizing medical and geographical synonyms across multiple languages, without manual indexing. The portal will be used to bring together content from a number of existing clinical trial registries and databases, allowing doctors and patients to review summarized results and find trials they can join.

Mayo Clinic also adopted the UIMA framework early in its development cycle as part of its broader collaboration with IBM in the area of unstructured text processing. Mayo Clinic used UIMA as the basis for implementing a system to extract knowledge from its approximately 20 million clinical notes. This provided the flexibility to combine a series of annotators from Mayo Clinic, IBM and the open source community in a plug-and-play fashion to rapidly create a powerful analytic solution with advanced capabilities.

Memorial Sloan-Kettering Cancer Center is working with IBM to develop a Web accessible data warehouse that will conform to HIPAA requirements. This data warehouse will enable clinicians and researchers from Memorial Sloan-Kettering Cancer Center to efficiently use data facilitating research on a new cancer taxonomy. An important aspect of the data warehouse is the inclusion of searchable concepts from Memorial Sloan-Kettering Cancer Center's text-based pathology reports. These concepts are automatically extracted by an IBM text analytics solution built on the UIMA framework.

Adding to the growing UIMA ecosystem, the General Architecture for Text Engineering (GATE -- gate.ac.uk) team at the University of Sheffield recently announced the delivery of an interoperability layer with UIMA. This new layer provides GATE users access to UIMA's flexible deployment options and UIMA users access to the many useful plug-ins already available in GATE for text mining, information extraction and natural language processing for research and commercial use.

UIMA has also received support from the Defense Advanced Research Projects Agency (DARPA) and is currently in use as part of DARPA's new human language technology research and development program called GALE (Global Autonomous Language Exploitation). The GALE Program is a five-year program involving industry and universities with the goal of developing and applying software technologies to absorb, analyze and interpret huge volumes of speech and text in multiple languages and provide distilled relevant and actionable information in English. UIMA has been adopted as the underlying integrating architecture for building large-scale multimodal unstructured information management applications.

In addition, several of the software vendors that previously announced plans to support UIMA have already made available their first UIMA compliant solutions, including companies such as ClearForest, Cognos, Factiva and nStein Technologies.

Availability
The source code for the IBM reference implementation of UIMA is currently available and can be downloaded from http://uima-framework.sourceforge.net/ . In addition, the IBM UIMA SDK, with additional facilities and components, can be downloaded for free from http://www.alphaworks.ibm.com/tech/uima .

UIMA is integral to the IBM Content Discovery portfolio that combines content integration, enterprise search, text analytics and contextual delivery. UIMA is embedded in numerous IBM products including IBM WebSphere Information Integrator OmniFind Edition which provides a complete platform for processing unstructured information as part of enterprise search and business intelligence solutions.

Contact(s) information

Steven Tomasco
IBM Media Relations
(914) 945-1655
stomasc@us.ibm.com

Related XML feeds
Topics XML feeds
Software
Information Management, Lotus, Tivoli, Rational, WebSphere, Open standards, open source