Motivation Goal Measures Use cases Diploma Theses Features Links Download License Documentation Publications Credits Contacts
Motivation
The question of similarity is a heavily researched subject in the computer science, artificial intelligence, psychology, and linguistics literature. Typically, those studies focus on the similarity between vectors [Baeza-Yates & Ribeiro-Neto '99, Salton & McGill '83], strings [Lord et al. '03], trees or graphs [Shasha & Zhang '97], or simple objects [Genter & Medina '98, Resnik '99, Lin '98].
In our case we are particularly interested in the similarity between concepts (complex objects) in ontologies. All measures are implemented in our Java-based generic similarity framework called SimPack
Goal
SimPack is intended primarily for the research of similarity between concepts in ontologies or ontologies as a whole. Possible other application areas of SimPack include
- the investigation of similarity between software source code. For instance to detect changes between classes of different software releases.
- the research of similarity between hierarchically-structured data, such as XML, to compare, search, or integrate data from different data sources.
SimPack is, for example, used in iSPARQL that is an extension of traditional SPARQL (SPARQL Protocol And RDF Query Language) that allows to query for similar concepts in ontologies.
Implemented Similarity Measures
The similarity between entities (concepts in ontologies, classes in source code, XML documents, data streams, etc.) can be measured by a myriad of similarity measures. Currently we have implemented similarity measures from the following categories:
- Feature vectors
Alignment, Cosine, Dice, Euclidean, Jaccard, Manhattan, Overlap, Pearson - Strings or sequences of strings (text)
Averaged String Matching, Jaro, TFIDF - Sets
Jaccard, Loss of Information, Resembalance - Sequences
Levensthein Edit Distance - Trees
Bottom-up/Top-down Maximum Common Subtree, Tree Edit Distance - Graphs
Conceptual Similarity, Graph Isomorphism, Subgraph Isomorphism, Maximum Common Subgraph Isomorphism, Graph Isomorphism Covering, Shortest Path - Information theory
Jiang & Conrath, Lin, Resnik
In addition, the measures from the SecondString, the SimMetrics, the ontology Alignment API, and the OWLS-MX projects are wrapped in SimPack.
Use cases
Diploma Theses
SimPack is furthermore used/extended in the following diploma theses written at the University of Zurich:
- Mining Software Repositories -- A Semantic Web Approach
- Using Genetic Programming and SimPack to Learn Global Similarity Measures
- The Fundamentals of iSPARQL
- Implementation and Evaluation of Graph Isomorphism Algorithms for RDF-Graphs
- Coogle - A Code Google Plug-in for Dectecting Similar Java Classes
- XQuery Similarity Joins
Features of SimPack
SimPack offers the following features among others:
- it offers a varity of different similarity measures for the use in ontologies and other research areas
- it is generic, i.e., it can be applied to different data structures given the excistence of approriate data accessors
- it is implemented in Java, thus portable
Links
SimPack uses the following APIs:
Cobertura, Colt, Eclipse, Famix, Jena, JGraphT, JUnit, Apache Lucene, OWL-S API, SecondString, SimMetrics, Taverna
Download
Current version is 0.91 (17 April 2008), previous was 0.90
Source distribution: simpack-0.91-src.zip
Source distribution including jar-files: simpack-0.91-all-src.zip (~46MB)
Only jar file: simpack-0.91-bin.jar
License
This work is licensed under LGPL.
Documentation
Publications
- Christoph Kiefer, Abraham Bernstein. The Creation and Evaluation of iSPARQL Strategies for Matchmaking. Proceedings of the 5th European Semantic Web Conference (ESWC). Tenerife, Spain, June 1-5, 2008. to appear [pdf] [bibtex]
- Christoph Kiefer, Abraham Bernstein, and Markus Stocker. The Fundamentals of iSPARQL - A Virtual Triple Approach For Similarity-Based Semantic Web Tasks. Proceedings of the 6th International Semantic Web Conference (ISWC). Busan, Korea, November 11-15, 2007. [pdf] [bibtex]
- Christoph Kiefer, Abraham Bernstein, Jonas Tappolet. Analyzing Software with iSPARQL. Proceedings of the 3rd ESWC International Workshop on Semantic Web Enabled Software Engineering (SWESE). Innsbruck, Austria, June 6, 2007. [pdf] [bibtex]
- Christoph Kiefer, Abraham Bernstein, Hong Joo Lee, Mark Klein, Markus Stocker. Semantic Process Retrieval with iSPARQL. Proceedings of the 4th European Semantic Web Conference (ESWC). Innsbruck, Austria, June 3-7, 2007. [pdf] [bibtex]
- Christoph Kiefer, Abraham Bernstein, Jonas Tappolet. Mining Software Repositories with iSPARQL and a Software Evolution Ontology. Proceedings of the ICSE International Workshop on Mining Software Repositories (MSR). Minneapolis, MA, May 19-20, 2007. [pdf] [bibtex]
- Patrick Ziegler, Christoph Kiefer, Christoph Sturm, Klaus R. Dittrich, and Abraham Bernstein. Generic Similarity Detection in Ontologies with the SOQA-SimPack Toolkit (Demo Paper). In 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD 2006). Chicago, USA, June 26-29, 2006. [pdf] [BibTeX]
- Tobias Sager, Abraham Bernstein, Martin Pinzger, Christoph Kiefer. Detecting Similar Java Classes Using Tree Algorithms. In MSR '06: Proceedings of the 2006 International Workshop on Mining Software Repositories, China, Shanghai, May 22-23, 2006. [pdf] [BibTeX]
- Patrick Ziegler, Christoph Kiefer, Christoph Sturm, Klaus Dittrich, and Abraham Bernstein. Detecting Similarities in Ontologies with the SOQA-SimPack Toolkit. 10th International Conference on Extending Database Technology (EDBT 2006), Munich, Germany, March 26-31, 2006. [pdf] [BibTeX]
- Abraham Bernstein and Christoph Kiefer. Imprecise RDQL: Towards Generic Retrieval in Ontologies Using Similarity Joins. 21th Annual ACM Symposium on Applied Computing (SAC/SIGAPP). Dijon, France, April 23-24, 2006. [pdf] [BibTeX]
- Abraham Bernstein and Christoph Kiefer. iRDQL Prototype Description (Demo Paper). Proceedings of 15th Workshop on Information Technology and Systems (WITS). Las Vegas, Nevada, United States. 2005. [pdf] [BibTeX]
- Abraham Bernstein and Christoph Kiefer. iRDQL - Imprecise Queries Using Similarity Joins for Retrieval in Ontologies (Poster Paper). 4th International Semantic Web Conference (ISWC). Galway, Irland, November 6-10, 2005. [pdf] [BibTeX] [poster]
- Abraham Bernstein and Christoph Kiefer. iRDQL - Imprecise RDQL Queries Using Similarity Joins. Third International Conference on Knowledge Capture (K-CAP). Banff, Alberta, Canada, October 2-5, 2005. [pdf] [BibTeX]
- Abraham Bernstein, Esther Kaufmann, Christoph Kiefer, and Christoph Bürki. Simpack: A Generic Java Library for Similiarity Measures in Ontologies (Working Paper). Department of Informatics, University of Zurich, 2005. [pdf] [BibTeX]
Credits
Daniel Baggenstos, Beat Fluri, Antoon Goderis, Silvan Hollenstein, Manuel Kägi, Tobias Sager, Markus Stocker, Michael Würsch
Contacts
Please do not hesitate to contact us if you have any kinds of questions or comments about the SimPack project. For questions and comments write to simpack [at] ifi.unizh.ch or contact one of the authors
Last modified April 17, 2008 by Christoph Kiefer <kiefer at ifi.uzh.ch>
26452 |