Christoph Kiefer, Non-Deductive Reasoning for the Semantic Web and Software Analysis, January 2009. (doctoralthesis)
The Semantic Web uses a number of knowledge representation (KR) languages to represent
the terminological knowledge of a domain in a structured and formally sound
way. Such KRs are typically description logics (DL), which are a particular
kind of knowledge representation languages. One of the underpinnings of the Semantic Web
and, therefore, a strength of any such semantic architecture, is the ability to
reason from data, that is, to derive new knowledge from basic facts. In other
words, the information that is already known and stored in the knowledgebase is
extended with the information that can be logically deduced from the ground
truth.
The world does, however, generally not fit into a fixed, predetermined logic
system of zeroes and ones. To account for this, especially in order to deal
with the uncertainty inherent in the physical world, different models of human
reasoning are required. Two prominent ways to model human reasoning are
similarity reasoning (aka analogical reasoning) and inductive reasoning. It has
been shown in recent years that the notion of similarity plays an important
role in a number of Semantic Web tasks, such as Semantic Web service matchmaking,
similarity-based service discovery, and ontology alignment. With inductive
reasoning, two prominent tasks that can benefit from the use of statistical
induction techniques are Semantic Web service classification and (semi-) automatic
semantic data annotation.
This dissertation transfers these ideas to the Semantic Web. To this end, it extends the
well-known RDF query language SPARQL with two novel, non-deductive reasoning
extensions in order to enable similarity and inductive reasoning. To address
these issues, specifically to implement the two novel reasoning variants by
using SPARQL, we introduce the concept of virtual triple patterns. Virtual
triples are not asserted but inferred. Hence, they do not exist in the
knowledgebase, but, rather, only as a result of the similarity/inductive
reasoning process.
To address similarity reasoning, we present the iSPARQL (imprecise SPARQL)
framework---an extension of traditional SPARQL that supports customized
similarity strategies via virtual triple patterns in order to explore an RDF
dataset for similar resources. For our inductive reasoning extension, we
introduce our SPARQL-ML (SPARQL Machine Learning) approach to create and work
with statistical induction/data mining models in traditional SPARQL.
Our presented iSPARQL and SPARQL-ML frameworks are validated using five
different case studies of heavily researched Semantic Web and Software Analysis tasks.
For the Semantic Web, these tasks are semantic service matchmaking, service discovery,
and service classification. For Software Analysis, we conduct some experiments
in software evolution and bug prediction. By applying our approaches to this
large number of different tasks, we hope to show the approaches' generality,
ease-of-use, extensibility, and high degree of flexibility in terms of
customization to the actual task.