The Smart SoftWareHouse


Current software development managers are greatly limited in their abilities to control their projects due to limited kinds of information they have about their projects. They typically compute a variety of project management related measures such as cost per module, slippage towards milestones, critical path, etc. Sophisticated software project managers include additional measurements such as source code metrics (e.g., lines of code, program complexity (McCabe 1976), mental effort (Halstead 1977), etc.) and evolution metrics (e.g., number of modifications and number of reported bugs, etc.). All these quantitative measures typically are listed in predefined project reports to assess the status-quo of a project. Though, these measures turned out to be valuable input they often obscure information. The basic reason lies in the abstractness of computed measures and in the complexity of underlying data making it difficult to grasp and interpret.

Addressing the issue of improving control of software projects we propose to develop a smart Soft-Ware-House that gathers all the software project information (including the actual code in all its iterations, project member interactions such as emails and built-in board postings, problem and modification report data, and all other documents available about the projects such as design documents) about multiple software development projects to allow both: OLAP-styled interactive data exploration; and analytic processing with inferential algorithms (i.e., data mining).

Applying these techniques to the Soft-Ware-House data would allow project managers to view the mess of project information from multiple perspectives aim to answer the following questions:

  1. Which are the key as well as "problematic" modules? Where and which problems will most likely occur (i.e., predict future problems and discover trends)? Where are shortcomings (including bottlenecks) in the software architecture and design that influence software evolution?
  2. What is the actual rather than the official team structure based on communication patterns? Where are organizational problems and how can they be addressed?
  3. How is the performance of a particular software project compared to different other industrial and open source projects? Which benchmarks are relevant for such a comparison?