We include the papers on this page to ensure timely dissemination on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by the copyrights. These works may not be reposted without the explicit permission of the copyright holder.
2009
-
Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, Abraham Bernstein, Tracking Concept Drift of Software Projects Using Defect Prediction Quality, Proceedings of the 6th IEEE Working Conference on Mining Software Repositories , May 2009, IEEE Computer Society. (inproceedings)
Defect prediction is an important task in the mining of
software repositories, but the quality of predictions varies
strongly within and across software projects. In this paper
we investigate the reasons why the prediction quality is so
fluctuating due to the altering nature of the bug (or defect)
fixing process. Therefore, we adopt the notion of a concept
drift, which denotes that the defect prediction model has
become unsuitable as set of influencing features has changed
? usually due to a change in the underlying bug generation
process (i.e., the concept). We explore four open source
projects (Eclipse, OpenOffice, Netbeans and Mozilla) and
construct file-level and project-level features for each of
them from their respective CVS and Bugzilla repositories.
We then use this data to build defect prediction models and
visualize the prediction quality along the time axis. These
visualizations allow us to identify concept drifts and ? as a
consequence ? phases of stability and instability expressed
in the level of defect prediction quality. Further, we identify
those project features, which are influencing the defect
prediction quality using both a tree induction-algorithm and
a linear regression model. Our experiments uncover that
software systems are subject to considerable concept drifts
in their evolution history. Specifically, we observe that the
change in number of authors editing a file and the number
of defects fixed by them contribute to a project?s concept
drift and therefore influence the defect prediction quality.
Our findings suggest that project managers using defect
prediction models for decision making should be aware of
the actual phase of stability or instability due to a potential
concept drift.
2007
-
Abraham Bernstein, Jayalath Ekanayake, Martin Pinzger, Improving Defect Prediction Using Temporal Features and Non Linear Models, Proceedings of the International Workshop on Principles of Software Evolution, September 2007, IEEE Computer Society. (inproceedings)
Predicting the defects in the next release of a large soft-
ware system is a very valuable asset for the pro ject manger
to plan her resources. In this paper we argue that temporal
features (or aspects) of the data are central to prediction per-
formance. We also argue that the use of non-linear models,
as opposed to traditional regression, is necessary to uncover
some of the hidden interrelationships between the features
and the defects and maintain the accuracy of the prediction
in some cases.
Using data obtained from the CVS and Bugzilla reposito-
ries of the Eclipse pro ject, we extract a number of temporal
features, such as the number of revisions and number of re-
ported issues within the last three months. We then use
these data to predict both the location of defects (i.e., the
classes in which defects will occur) as well as the number of
reported bugs in the next month of the pro ject. To that end
we use standard tree-based induction algorithms in compar-
ison with the traditional regression.
Our non-linear models uncover the hidden relationships be-
tween features and defects, and present them in easy to un-
derstand form. Results also show that using the temporal
features our prediction model can predict whether a source
?le will have a defect with an accuracy of 99% (area under
ROC curve 0.9251) and the number of defects with a mean
absolute error of 0.019 (Spearman?s correlation of 0.96).
RDF for all publications
BibTeX for all publications