Abstract

Current software development managers are greatly limited in their abilities to control their projects due to limited kinds of information they have about their projects. They typically compute a variety of project management related measures such as cost per module, slippage towards milestones, critical path, etc. Sophisticated software project managers include additional measurements such as source code metrics (e.g., lines of code, program complexity (McCabe 1976), mental effort (Halstead 1977), etc.) and evolution metrics (e.g., number of modifications and number of reported bugs, etc.). All these quantitative measures typically are listed in predefined project reports to assess the status-quo of a project. Though, these measures turned out to be valuable input they often obscure information. The basic reason lies in the abstractness of computed measures and in the complexity of underlying data making it difficult to grasp and interpret.

Addressing the issue of improving control of software projects we propose to develop a smart Soft-Ware-House that gathers all the software project information (including the actual code in all its iterations, project member interactions such as emails and built-in board postings, problem and modification report data, and all other documents available about the projects such as design documents) about multiple software development projects to allow both: OLAP-styled interactive data exploration; and analytic processing with inferential algorithms (i.e., data mining).

Applying these techniques to the Soft-Ware-House data would allow project managers to view the mess of project information from multiple perspectives aim to answer the following questions:

Which are the key as well as "problematic" modules? Where and which problems will most likely occur (i.e., predict future problems and discover trends)? Where are shortcomings (including bottlenecks) in the software architecture and design that influence software evolution?
What is the actual rather than the official team structure based on communication patterns? Where are organizational problems and how can they be addressed?
How is the performance of a particular software project compared to different other industrial and open source projects? Which benchmarks are relevant for such a comparison?

People involved

Publications

C. Kiefer, A. Bernstein, J. Tappolet, "Analyzing Software with iSPARQL", Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007), June 2007, Springer. (inproceedings) <tt> KiBeTa2007analyzing-SWESE.pdf</tt>
C. Kiefer, A. Bernstein, J. Tappolet, "Mining Software Repositories with iSPARQL and a Software Evolution Ontology", Proceedings of the 2007 International Workshop on Mining Software Repositories (MSR '07), 2007, IEEE Computer Society. (inproceedings) <tt>kiefer07mining.pdf </tt>
A. Bernstein, J. Ekanayake, M. Pinzger, "Improving Defect Prediction Using Temporal Features and Non Linear Models", The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, September 2007, ACM Press, New York, NY, USA. (inproceedings) <tt>iwpse07-mining.pdf </tt>
T. Sager, A. Bernstein, M. Pinzger, C. Kiefer, "Detecting Similar Java Classes Using Tree Algorithms", MSR '06: Proceedings of the 2006 International Workshop on Mining Software Repositories, May 2006, ACM Press, New York, NY, USA. (inproceedings) <tt>fp14-sager.pdf</tt>