The success of statistics-based techniques in almost every area of artificial intelligence and in practical applications on the Web challenges the traditional logic-based approach of the Semantic Web. We believe that we should treat statistical inference techniques as a complement to the existing Semantic Web infrastructure. Consequently, a big challenge for Semantic Web research is not if, but how to extend the existing Semantic Web techniques with statistical learning and inferencing capabilities.
We, therefore, argue that the large and continuously growing amount of interlinked Semantic Web data is a perfect match for statistical relational learning methods (SRL) due to their focus on relations between entities in addition to features/attributes of entities of traditional, propositional data mining techniques.
The fact that companies such as Microsoft and Oracle have recently added data mining extensions to their relational database management systems underscores their importance, and calls for a similar solution for RDF stores and SPARQL respectively.
To support the integration of traditional Semantic Web techniques and machine learning-based statistical inferencing, we developed this approch to create and work with data mining models in SPARQL. Our framework enables to predict/classify unseen data (or features) and relations in a new dataset based on the results of a mining model. In particular, our approach allows the usage of statistical relational learning methods, which take the relations between entities (or resources) into account. This allows us to induce statistical models without prior propositionalization (i.e., translation to a single table)---a cumbersome and error-prone task.
SPARQL-ML requires a working installation of ARQ 2.0. Don't forget to set the ARQROOT as indicated in the installation manual of ARQ (README.txt in installation folder).
- Download SPARQL-ML from the link below.
- Unzip the file into your ARQROOT folder. This adds a new mining folder, command line scripts, and the necessary jar files.
- Make all the scripts in the ARQROOT/bin folder executable:
[peter@neverland]$ chmod u+x ARQROOT/bin/*
- Download and install the appropriate MonetDB Server Mars version for your operating system. Installation notes for this step can be found here.
Note: If you need to install MonetDB from sources in case the target platform is not support such as AMD64, the following command may be successful (as root from within the MonetDB source folder):
[peter@neverland]$ ./monetdb-install.sh --prefix=/usr/local --enable-sql --enable-xquery --enable-optimise
1. Run MonetDB with one of the following commands in order to initialize it with the startup script of Proximity.
- Linux/Mac OS X: Mserver --dbfarm=/some/dir/you/own --set sql_logdir=/some/other/dir/you/own --dbname sparqlml $ARQROOT/mining/init-mserver.mil --set port=30000
- Windows: Mserver.bat --dbfarm=C:\some\dir\you\own --set sql_logdir=C:\some\other\dir\you\own --dbname sparqlml %ARQROOT%\mining\init-mserver.mil --set port=30000
2. Learn a new mining model with the CREATE MINING MODEL query:
Run the example query $ARQROOT/mining/examples/learn.rq with the data in $ARQROOT/mining/examples/training-set.owl.
You can either use the sparql-ml command line script in the $ARQROOT/bin (bat on Windows) folder, i.e.,
[peter@neverland]$ $ARQROOT/bin/sparql-ml --data=$ARQROOT/mining/examples/training-set.owl --query=$ARQROOT/mining/examples/learn.rq
or access the SPARQL-ML functions from your code: