Publications

2010

Jonas Tappolet, Christoph Kiefer, Abraham Bernstein, Semantic web enabled software analysis, Journal of Web Semantics: Science, Services and Agents on the World Wide Web 8, July 2010. (article)

One of the most important decisions researchers face when analyzing software systems is the choice of a proper data analysis/exchange format. In this paper, we present EvoOnt, a set of software ontologies and data exchange formats based on OWL. EvoOnt models software design, release history information, and bug-tracking meta-data. Since OWL describes the semantics of the data, EvoOnt (1) is easily extendible, (2) can be processed with many exist- ing tools, and (3) allows to derive assertions through its inherent Description Logic reasoning capabilities. The contribution of this paper is that it introduces a novel software evolution ontology that vastly simplifies typical software evolution analysis tasks. In detail, we show the usefulness of EvoOnt by repeating selected software evolution and analysis experiments from the 2004-2007 Mining Software Repositories Workshops (MSR). We demonstrate that if the data used for analysis were available in EvoOnt then the analyses in 75% of the papers at MSR could be reduced to one or at most two simple queries within off-the-shelf SPARQL tools. In addition, we present how the inherent capabilities of the Semantic Web have the potential of enabling new tasks that have not yet been addressed by software evolution researchers, e.g., due to the complexities of the data integration.

2009

Jonas Tappolet, Abraham Bernstein, Applied Temporal RDF: Efficient Temporal Querying of RDF Data with SPARQL, Proceedings of the 6th European Semantic Web Conference (ESWC), June 2009, Springer. (inproceedings)

Many applications operate on time-sensitive data. Some of these data are only valid for certain intervals (e.g., job-assignments, versions of software code), others describe temporal events that happened at certain points in time (e.g., a persons birthday). Until recently, the only way to incorporate time into Semantic Web models was as a data type property. Temporal RDF, however, considers time as an additional dimension in data preserving the semantics of time. In this paper we present a syntax and storage format based on named graphs to express temporal RDF. Given the restriction to preexisting RDF-syntax, our approach can perform any temporal query using standard SPARQL syntax only. For convenience, we introduce a shorthand format called -SPARQL for temporal queries and show how -SPARQL queries can be translated to standard SPARQL. Additionally, we show that, depending on the underlying data?s nature, the temporal RDF approach vastly reduces the number of triples by eliminating redundancies resulting in an increased performance for processing and querying. Last but not least, we introduce a new indexing approach method that can significantly reduce the time needed to execute time point queries (e.g., what happened on January 1st).
Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, Abraham Bernstein, Tracking Concept Drift of Software Projects Using Defect Prediction Quality, Proceedings of the 6th IEEE Working Conference on Mining Software Repositories , May 2009, IEEE Computer Society. (inproceedings)

Defect prediction is an important task in the mining of software repositories, but the quality of predictions varies strongly within and across software projects. In this paper we investigate the reasons why the prediction quality is so fluctuating due to the altering nature of the bug (or defect) fixing process. Therefore, we adopt the notion of a concept drift, which denotes that the defect prediction model has become unsuitable as set of influencing features has changed ? usually due to a change in the underlying bug generation process (i.e., the concept). We explore four open source projects (Eclipse, OpenOffice, Netbeans and Mozilla) and construct file-level and project-level features for each of them from their respective CVS and Bugzilla repositories. We then use this data to build defect prediction models and visualize the prediction quality along the time axis. These visualizations allow us to identify concept drifts and ? as a consequence ? phases of stability and instability expressed in the level of defect prediction quality. Further, we identify those project features, which are influencing the defect prediction quality using both a tree induction-algorithm and a linear regression model. Our experiments uncover that software systems are subject to considerable concept drifts in their evolution history. Specifically, we observe that the change in number of authors editing a file and the number of defects fixed by them contribute to a project?s concept drift and therefore influence the defect prediction quality. Our findings suggest that project managers using defect prediction models for decision making should be aware of the actual phase of stability or instability due to a potential concept drift.

2008

Jonas Tappolet, Semantics-aware Software Project Repositories, ESWC 2008 Ph.D. Symposium, June 2008. (inproceedings)

This proposal explores a general framework to solve software analysis tasks using ontologies. Our aim is to build semantically anno- tated, flexible, and extensible software repositories to overcome data representation, intra- and inter-project integration difficulties as well as to make the tedious and error-prone extraction and preparation of meta-data obsolete. We also outline a number of practical evaluation approaches for our propositions.

2007

Christoph Kiefer, Abraham Bernstein, Jonas Tappolet, Analyzing Software with iSPARQL, Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007), June 2007, Springer. (inproceedings)
Jonas Tappolet, Mining Software Repositories - A Semantic Web Approach , March 2007. (diplomathesis)

Modern software development has become a complex task. Software systems grow larger and are densely interconnected to other systems making excessive use of large communication frameworks. To cope with this complexity, software developers and project managers need the assistance of tools which extract information about flaws in code as well as general information about the state of a project. In this thesis, we first introduce a data exchange format based on OWL/RDF, the Semantic Web?s format of choice today, able to store data and meta data from the source code, versioning system (i.e. CVS) and bug tracking system (i.e. Bugzilla). In a next step, we present a tool to retrieve the data from the online software repositories and to store it in OWL/RDF. This tool is implemented as a plug-in for the Eclipse IDE and is able to harvest data from projects managed by Eclipse. Finally, we evaluated our data format and tools by applying a set of software metric calculations, pattern detections and similarity measures by using iSPARQL and SimPack. The results of the conducted experiments are promising, and gave a first proof of our approach.
Christoph Kiefer, Abraham Bernstein, Jonas Tappolet, Mining Software Repositories with iSPARQL and a Software Evolution Ontology, Proceedings of the 2007 International Workshop on Mining Software Repositories (MSR '07), March 2007, IEEE Computer Society. (inproceedings)

One of the most important decisions researchers face when analyzing the evolution of software systems is the choice of a proper data analysis/exchange format. Most existing formats have to be processed with special programs written specifically for that purpose and are not easily extendible. Most scientists, therefore, use their own database(s) requiring each of them to repeat the work of writing the import/export programs to their format. We present EvoOnt, a software repository data exchange format based on the Web Ontology Language (OWL). EvoOnt includes software, release, and bug-related information. Since OWL describes the semantics of the data, EvoOnt is (1) easily extendible, (2) comes with many existing tools, and (3) allows to derive assertions through its inherent Description Logic reasoning capabilities. The paper also shows iSPARQL ? our SPARQL-based Semantic Web query engine containing similarity joins. Together with EvoOnt, iSPARQL can accomplish a sizable number of tasks sought in software repository mining projects, such as an assessment of the amount of change between versions or the detection of bad code smells. To illustrate the usefulness of EvoOnt (and iSPARQL), we perform a series of experiments with a real-world Java project. These show that a number of software analyses can be reduced to simple iSPARQL queries on an EvoOnt dataset.

RDF for all publications BibTeX for all publications

Student Advisor

2011

Clemens Wilding, IfiPipes - the RDF UI widget framework, 02 2011. (mastersthesis)

This master thesis is about the creation of a semi-automatic data retrieval and visualization frame- work. The framework describes how semantic data can be retrieved from web services or SPARQL endpoints. By using Semantic Web technologies like RDF and OWL, the queried data is analyzed and depending on the type of data, a fitting visualization is displayed. An implementation of the framework is done by creating a web application in Java by using Google Web Toolkit. The application allows a user with knowledge of Semantic Web basics to query data sources from web services and allows him/her to upload new data sets. These can be visualized with little or no programming skills, which should make it easier for non-engineers to use semantic data. The application makes use of the cutting-edge tGraph framework to analyze and display temporal data. Special visualizations features provided in the web application are the display of information on a map, the use of temporal sliders to select time intervals and a tabular representation for larger data.

2008

Sonja Näf, Mining Software Repositories with Relational Data Mining Methods, February 2008. (diplomathesis)

In complex software projects a lot of information about defect, release and source code history is gathered. Researchers figured out that mining these software repositories could provide valuable information about the software development. So far, software repositories were mined with traditional data mining methods which are suitable for propositional data. Propositional data is flat and homogeneous, held in a single-table-database. This thesis compares the traditional approach with relational data mining methods which are able to handle heterogeneous data. First, an introduction about relational data mining is given and then a few relational data mining tools are introduced. In a next step we present the data for our experiments and the necessary data preparations. Finally, we conduct several experiments which show the advantages as well as the weaknesses of the relational approach.

RDF for all publications BibTeX for all publications