Contact:

Minh Khoa Nguyen

Department of Informatics
University of Zurich
Binzmühlestr. 14
CH-8050 Zurich

Office: Uni Oerlikon 2.D.07

Email: nguyen [at] ifi.uzh.ch

Short Bio

I...

My Publications

2010

Minh Khoa Nguyen, Cosmin Basca, Abraham Bernstein, B+Hash Tree: Optimizing query execution times for on-Disk Semantic Web data structures, Proceedings Of The 6th International Workshop On Scalable Semantic Web Knowledge Base Systems (SSWS2010), Editor(s): Achille Fokoue, Thorsten Liebig, Yuanbo Guo, November ; 2010. (inproceedings/full paper)

The increasing growth of the Semantic Web has substantially enlarged the amount of data available in RDF format. One proposed so- lution is to map RDF data to relational databases (RDBs). The lack of a common schema, however, makes this mapping inefficient. Some RDF-native solutions use B+Trees, which are potentially becoming a bottleneck, as the single key-space approach of the Semantic Web may even make their O(log(n)) worst case performance too costly. Alterna- tives, such as hash-based approaches, suffer from insufficient update and scan performance. In this paper we propose a novel type of index struc- ture called a B+Hash Tree, which combines the strengths of traditional B-Trees with the speedy constant-time lookup of a hash-based structure. Our main research idea is to enhance the B+Tree with a Hash Map to enable constant retrieval time instead of the common logarithmic one of the B+Tree. The result is a scalable, updatable, and lookup-optimized, on-disk index-structure that is especially suitable for the large key-spaces of RDF datasets. We evaluate the approach against existing RDF index- ing schemes using two commonly used datasets and show that a B+Hash Tree is at least twice as fast as its competitors ? an advantage that we show should grow as dataset sizes increase.
Minh Khoa Nguyen, Optimized disk oriented tree structures for RDF indexing: the B+Hash Tree, 08 2010. (bachelorsthesis)

The increasing growth of the Semantic Web has substantially enlarged the amount of data available in RDF (Resource Description Framework) format. One proposed solution is to map RDF data to relational databases. The lack of a common schema, however, makes this mapping inefficient. RDF-native solutions often use B+Trees, which are potentially becoming a bottleneck, as the single key-space approach of the Semantic Web may even make their O(log(n)) worst case performance too costly. Alternatives, such as hash-based approaches, suffer from insufficient update and scan performance. In this thesis a novel type of index structure called B+HASH TREE is being proposed, which combines the strengths of traditional B-Trees with the speedy constant-time lookup of a hash-based structure. The main research idea is to enhance the B+Tree with a Hash Map to enable constant retrieval time instead of the common logarithmic one of the B+Tree. The result is a scalable, updatable, and lookup-optimized, on-disk index-structure that is especially suitable for the large key-spaces of RDF datasets. The approach is evaluated against existing RDF indexing schemes using two commonly used datasets and show that a B+HASH TREE is at least twice as fast as its competitors ? an advantage that this thesis shows should grow as dataset sizes increase.

RDF for all publications BibTeX for all publications