Minh Khoa Nguyen, Cosmin Basca, Abraham Bernstein, B+Hash Tree: Optimizing query execution times for on-Disk Semantic Web data structures, Proceedings Of The 6th International Workshop On Scalable Semantic Web Knowledge Base Systems (SSWS2010), Editor(s): Achille Fokoue, Thorsten Liebig, Yuanbo Guo, November ; 2010. (inproceedings/full paper)
The increasing growth of the Semantic Web has substantially enlarged the amount of data available in RDF format. One proposed so- lution is to map RDF data to relational databases (RDBs). The lack of a common schema, however, makes this mapping inefficient. Some RDF-native solutions use B+Trees, which are potentially becoming a bottleneck, as the single key-space approach of the Semantic Web may even make their O(log(n)) worst case performance too costly. Alterna- tives, such as hash-based approaches, suffer from insufficient update and scan performance. In this paper we propose a novel type of index struc- ture called a B+Hash Tree, which combines the strengths of traditional B-Trees with the speedy constant-time lookup of a hash-based structure. Our main research idea is to enhance the B+Tree with a Hash Map to enable constant retrieval time instead of the common logarithmic one of the B+Tree. The result is a scalable, updatable, and lookup-optimized, on-disk index-structure that is especially suitable for the large key-spaces of RDF datasets. We evaluate the approach against existing RDF index- ing schemes using two commonly used datasets and show that a B+Hash Tree is at least twice as fast as its competitors ? an advantage that we show should grow as dataset sizes increase.
Minh Khoa Nguyen, Optimized disk oriented tree structures for RDF indexing: the B+Hash Tree, 08 2010. (bachelorsthesis)
The increasing growth of the Semantic Web has substantially enlarged the amount of data available
in RDF (Resource Description Framework) format. One proposed solution is to map RDF
data to relational databases. The lack of a common schema, however, makes this mapping inefficient.
RDF-native solutions often use B+Trees, which are potentially becoming a bottleneck, as the
single key-space approach of the Semantic Web may even make their O(log(n)) worst case performance
too costly. Alternatives, such as hash-based approaches, suffer from insufficient update
and scan performance. In this thesis a novel type of index structure called B+HASH TREE is being
proposed, which combines the strengths of traditional B-Trees with the speedy constant-time
lookup of a hash-based structure. The main research idea is to enhance the B+Tree with a Hash
Map to enable constant retrieval time instead of the common logarithmic one of the B+Tree. The
result is a scalable, updatable, and lookup-optimized, on-disk index-structure that is especially
suitable for the large key-spaces of RDF datasets. The approach is evaluated against existing RDF
indexing schemes using two commonly used datasets and show that a B+HASH TREE is at least
twice as fast as its competitors ? an advantage that this thesis shows should grow as dataset sizes
increase.