Contact

University of Zurich
Department of Informatics
Binzmühlestrasse 14
CH - 8050 Zürich

EMail: weiss |at| ifi.uzh.ch

About

I am an external doctoral student at the Dynamic and Distributed Information Systems group headed by Prof. Abraham Bernstein at University of Zurich.
I received my Bachelor's as well as my Master's degree from Saarland University, Saarbrücken (Germany) in 2007. My theses covered topics such as effort-prediction for bugs in software and vulnerability mining.
My PhD research interests mainly focus on database systems for graph-structured data, such as Semantic Web data.

Courses & Projects

Master Project (Summer Break 2008)
Practical Artificial Intelligence (Spring 2008)

Student Advisor

Michael Imhof (Master Thesis), finished
Simon Berther (Diploma Thesis), finished
Alexander Bucher (Diploma Thesis), finished

Publications

2009

Cathrin Weiss, Abraham Bernstein, On-disk storage techniques for Semantic Web data - Are B-Trees always the optimal solution?, Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, October 2009. (inproceedings)

Since its introduction in 1971, the B-tree has become the dominant index structure in database systems. Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results. The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored. In this paper we show that in the case of triple-stores the usage of B-trees is actually highly detrimental to query performance. Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets. Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage. The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments. As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases.

2008

Cathrin Weiss, Panagiotis Karras, Abraham Bernstein, Hexastore: Sextuple Indexing for Semantic Web Data Management, Proc. of the 34th Intl Conf. on Very Large Data Bases (VLDB), February 2008. (inproceedings)
Cathrin Weiss, Abraham Bernstein, Sandro Boccuzzo, i-MoCo: Mobile Conference Guide - Storing and querying huge amounts of Semantic Web data on the iPhone/iPod Touch, October 2008. (misc)

Querying and storing huge amounts of Semantic Web data ? this has usually required a lot of computational power. This is no longer true if one makes use of recent research outcomes like modern RDF indexing strategies. We present a mobile conference guide application that combines several different RDF data sets to present interlinked information about publications, conferences, authors, locations, and others to the user. With our application we show that it is possible to store a big amount of indexed data on an iPhone/iPod Touch device. That querying is also efficent is demonstrated by creating the application?s actual content out of real time queries on the data.
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, What Makes a Good Bug Report?, Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), February 2008. (inproceedings)

2007

Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, Andreas Zeller, How Long will it Take to Fix This Bug?, Proceedings of the Fourth International Workshop on Mining Software Repositories, Editor(s): Harald C. Gall, Michele Lanza, May ; 2007, IEEE Computer Society. (inproceedings)

Predicting the time and effort for a software problem has long been a difficult task. We present an approach that automatically predicts the fixing effort, i.e., the person-hours spent on fixing an issue. Our technique leverages existing issue tracking systems: given a new issue report, we use the Lucene framework to search for similar, earlier reports and use their average time as a prediction. Our approach thus allows for early effort estimation, helping in assigning issues and scheduling stable releases. We evaluated our approach using effort data from the JBoss project. Given a sufficient number of issues reports, our automatic predictions are close to the actual effort; for issues that are bugs, we are off by only one hour, beating naive predictions by a factor of four.
Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, Andreas Zeller, Predicting Effort to fix Software Bugs, Proceedings of the 9th Workshop Software Reengineering, May 2007. (inproceedings)
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, Quality of Bug Reports in Eclipse, Proceedings of the 2007 OOPSLA Workshop on Eclipse Technology eXchange, October 2007, ACM. (inproceedings)

The information in bug reports influences the speed at which bugs are fixed. However, bug reports differ in their quality of information. We conducted a survey responses among the ECLIPSE developers to determine the information in reports that they widely used and the problems frequently encountered. Our results show that steps to reproduce and stack traces are most sought after by developers, while inaccurate steps to reproduce and incomplete information pose the largest hurdles. Surprisingly, developers are indifferent to bug duplicates. Such insight is useful to design new bug tracking tools that guide reporters at providing more helpful information. We also present a prototype of a quality-meter tool that measures the quality of bug reports by scanning its content.

RDF for all publications BibTeX for all publications