Motivation

The success of statistics-based techniques in almost every area of artificial intelligence and in practical applications on the Web challenges the traditional logic-based approach of the Semantic Web. We believe that we should treat statistical inference techniques as a complement to the existing Semantic Web infrastructure. Consequently, a big challenge for Semantic Web research is not if, but how to extend the existing Semantic Web techniques with statistical learning and inferencing capabilities.

We, therefore, argue that the large and continuously growing amount of interlinked Semantic Web data is a perfect match for statistical relational learning methods (SRL) due to their focus on relations between entities in addition to features/attributes of entities of traditional, propositional data mining techniques.

The fact that companies such as Microsoft and Oracle have recently added data mining extensions to their relational database management systems underscores their importance, and calls for a similar solution for RDF stores and SPARQL respectively.

To support the integration of traditional Semantic Web techniques and machine learning-based statistical inferencing, we developed this approch to create and work with data mining models in SPARQL. Our framework enables to predict/classify unseen data (or features) and relations in a new dataset based on the results of a mining model. In particular, our approach allows the usage of statistical relational learning methods, which take the relations between entities (or resources) into account. This allows us to induce statistical models without prior propositionalization (i.e., translation to a single table)---a cumbersome and error-prone task.

Installation

SPARQL-ML requires a working installation of ARQ 2.0. Don't forget to set the ARQROOT as indicated in the installation manual of ARQ (README.txt in installation folder).

Download SPARQL-ML from the link below.
Unzip the file into your ARQROOT folder. This adds a new mining folder, command line scripts, and the necessary jar files.
Make all the scripts in the ARQROOT/bin folder executable:

[peter@neverland]$ chmod u+x ARQROOT/bin/*
Download and install the appropriate MonetDB Server Mars version for your operating system. Installation notes for this step can be found here.

Note: If you need to install MonetDB from sources in case the target platform is not support such as AMD64, the following command may be successful (as root from within the MonetDB source folder):

[peter@neverland]$ ./monetdb-install.sh --prefix=/usr/local --enable-sql --enable-xquery --enable-optimise

First Steps

1. Run MonetDB with one of the following commands in order to initialize it with the startup script of Proximity.

Linux/Mac OS X: Mserver --dbfarm=/some/dir/you/own --set sql_logdir=/some/other/dir/you/own --dbname sparqlml $ARQROOT/mining/init-mserver.mil --set port=30000
Windows: Mserver.bat --dbfarm=C:\some\dir\you\own --set sql_logdir=C:\some\other\dir\you\own --dbname sparqlml %ARQROOT%\mining\init-mserver.mil --set port=30000

2. Learn a new mining model with the CREATE MINING MODEL query:

Run the example query $ARQROOT/mining/examples/learn.rq with the data in $ARQROOT/mining/examples/training-set.owl.
You can either use the sparql-ml command line script in the $ARQROOT/bin (bat on Windows) folder, i.e.,

[peter@neverland]$ $ARQROOT/bin/sparql-ml --data=$ARQROOT/mining/examples/training-set.owl --query=$ARQROOT/mining/examples/learn.rq

or access the SPARQL-ML functions from your code:


QueryML query = QueryFactoryML.create(QueryString);

QueryExecutionML qe = QueryExecutionFactoryML.create(query, DataModel);

Model trainModel = qe.execCreate();

trainModel.write(System.out, "RDF/XML-ABBREV");

3. Apply a mining model on new data with the PREDICT query:

Run the example query $ARQROOT/mining/examples/apply.rq with the data in $ARQROOT/mining/examples/test-set.owl.
You can either use the sparql-ml command line script in the ARQ bat/bin folder:

[peter@neverland]$ $ARQROOT/bin/sparql-ml --data=mining/examples/test-set.owl --query=$ARQROOT/mining/examples/apply.rq

or access the SPARQL-ML functions from your code:


QueryML query = QueryFactoryML.create(QueryString);

QueryExecutionML qe = QueryExecutionFactoryML.create(query, DataModel);

ResultSet results = qe.execSelect();

ResultSetFormatter.out(System.out, results, query);

4. Check the evaluation results of your mining model in the $ARQROOT/mining/results folder. This includes the accurracy of the prediction, the confusion matrix, and the ROC curves (which you can then plot, for instance, with Gnuplot).

License

SPARQL-ML is licensed under LGPL.

Download

Source distribution including jar-files: sparqlml-0.40-all-src.zip (~6MB)

Questions and Bug Reports

If you find bugs or have any questions about SPARQL-ML, please contact me via email.

Documentation

Javadoc API

Datasets

We have tested SPARQL-ML on the following datasets:

Business Projects (already included in the download)
OWLS-TC
SWRC

Relevant Publications

Christoph Kiefer, Abraham Bernstein, André Locher. Adding Data Mining Support to SPARQL via Statistical Relational Learning Methods. Proceedings of the 5th European Semantic Web Conference (ESWC). Tenerife, Spain, June 1-5, 2008. to appear [pdf] [bibtex]
André Locher. SPARQL-ML: Knowledge Discovery for the Semantic Web (diploma thesis University of Zurich)
Neville, J., D. Jensen, L. Friedland and M. Hay (2003). Learning relational probability trees. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Neville, J., D. Jensen and B. Gallagher (2003). Simple estimators for relational Bayesian classifiers. Proceedings of The 3rd IEEE International Conference on Data Mining.

Last modified March 15th, 2008 by Christoph Kiefer <kiefer at ifi.uzh.ch>

4704