Universität Ulm, Fakultät für Informatik, Abtl. Künstliche Intelligenz up: Diplomandenseminar KI

Diploma Thesis Final Presentation

SemWeaver - A Tool to Extract Relationships
from Natural Language Exploiting Domain Ontologies

Tobias Wunner, 28.07.2009



 
 Abstract

Recently there has been a large development of Natural Language Processing (NLP) tools (e.g. Part-Of-Speech Tagging, Chunker, Parser, Named Entity Recognition, etc.) which are able to machine process natural language and enrich the data with syntactic structure. This motivated the development of tools to extract relations ([1],[2]) from natural language. Relationship extraction is the task to identify and extract relationships in natural language:


"When asked about Shomrat, Steffen Schulze, a spokesman for German President Horst Köhler, told the Post..."


isSpokesManOf( SteffenSchulze, HorstKöhler )


isGermanPresidentOf( HorstKöhler, GermanGovernmentAdministration )

This thesis proposes an ontology-driven approach which integrates ontology reasoning and linguistic methods to improve relationship recognition w.r.t. to a domain model. Furthermore a system is developed, which (i) assists the user by visualizing and the generation of suggestions in the extraction process, and (ii) exploits the hierarchic nature of the relationships, as typically encountered in real world data, and therefore allows a higher degree of freedom in the matching process. The quality of the algorithm is evaluated on a manually annotated German Newspaper Corpus w.r.t. to several developed ontology error measures.

The presentation gives (i) a brief overview of the design and implementation of the developed tool, (ii) evaluates the results, w.r.t. to the developed similarity measures over the news data, and (iii) discusses the perspective of possible improvements of the systems components.
Abtl. KI Startseite Hilfe Mail an Webmaster TL, 22.7.2009