Complex-Event Processing – Department of Informatics – DDIS https://www.uzh.ch/blog/ifi-ddis Dynamic and Distributed Information Systems Group Thu, 29 Aug 2013 15:07:22 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.2 Cassandra-Esper version 0.3 released to Maven Central https://www.uzh.ch/blog/ifi-ddis/2013/08/29/cassandra-esper-version-0-3-released-to-maven-central/ Thu, 29 Aug 2013 15:07:22 +0000 http://www.uzh.ch/blog/ifi-ddis/?p=173 A new version of Cassandra-Esper was released to Maven Central. Version 0.3 now supports insert, update and delete operations as well as proper initialization. Cassandra-Esper provides an implementation of Esper virtual data windows as a distributed key-value store for background information.

]]>
Three DDIS-Papers got accepted for the SSWS Worskhop at the ISWC 2013 in Sydney https://www.uzh.ch/blog/ifi-ddis/2013/08/12/two-ddis-papers-got-accepted-for-the-ssws-worskhop-at-the-iswc-2013-in-sydney/ Mon, 12 Aug 2013 06:47:10 +0000 http://www.uzh.ch/blog/ifi-ddis/?p=154 Two DDIS-Papers got recently accepted for 9th International Workshop on Scalable Semantic Web Knowledge Base Systems at the ISWC 2013 in Sydney. Find the abstracts of the two interesting papers by Minh Khoa Nguyen, Lorenz Fischer, Dr. Thomas Scharrenbach, Philip Stutz, Mihaela Verman, and Prof. Abraham Bernstein in this Blog-Post.

Abstract: Network-Aware Workload Scheduling: Scalable Linked Data Stream Processing

Lorenz Fischer, Thomas Scharrenbach, Abraham Bernstein

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the

workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy|a uniform distribution of

computation load among available machines|typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of inter-machine communication.

In this paper we propose a graph-partitioning based approach for workload scheduling within stream processing systems. We implemented a

distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets.We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.

 

Abstract: Eviction Strategies for Semantic Flow Processing

Minh Khoa Nguyen, Thomas Scharrenbach, and Abraham Bernstein

In order to cope with the ever-increasing data volume continuous processing of incoming data via Semantic Flow Processing systems have been proposed. These systems allow to answer queries on streams of RDF triples. To achieve this goal they match (triple) patterns against the incoming stream and generate/update variable bindings. Yet, given the continuous nature of the stream the number of bindings can explode and exceed memory; in particular when computing aggregates. To make the information processing practical Semantic Flow Processing systems, therefore, typically limit the considered data to a (moving) window. Whilst this technique is simple it may not be able to find patterns spread further than the window or may still cause memory overruns when data is highly bursty.

In this paper we propose to maintain bindings (and thus memory) not on recency (i.e., a window) but on the likelihood of contributing to a complete match. We propose to base the decision on the matching likelihood and not creation time (fifo) or at random. Furthermore we propose to drop variable bindings instead of data as do load shedding approaches. Specifically, we systematically investigate deterministic and the matching-likelihood based probabilistic eviction strategy for dropping variable bindings in terms of recall. We find that matching likelihood based eviction outperforms fifo and random eviction strategies on synthetic as well as real world data.

 

Abstract: TripleRush: A Fast and Scalable Triple Store 

Philip Stutz, Mihaela Verman, Lorenz Fischer, and Abraham Bernstein

TripleRush is a parallel in-memory triple store designed to address the need for efficient graph stores that quickly answer queries over large-scale graph data. To that end it leverages a novel, graph-based architecture.

Specifically, TripleRush is built on our parallel and distributed graph processing framework Signal/Collect. The index structure is represented as a graph where each index vertex corresponds to a triple pattern. Partially matched copies of a query are routed in parallel along different paths of this index structure.

We show experimentally that TripleRush takes less than a third of the time to answer queries compared to the fastest of three state-of-the-art triple stores, when measuring time as the geometric mean of all queries for two benchmarks.

On individual queries, TripleRush is up to three orders of magnitude faster than other triple stores.

 

]]>
Cassandra integration for Esper released https://www.uzh.ch/blog/ifi-ddis/2013/07/08/cassandra-integration-for-esper-released/ Mon, 08 Jul 2013 12:58:56 +0000 http://www.uzh.ch/blog/ifi-ddis/?p=149 The DDIS group at UZH released the first version of the cassandra-esper library at Maven central. The cassandra-esper library integrates Apache Cassandra via virtual data windows into the Esper complex event processing engine.

The ViSTA-TV engine can now directly query the data warehouse’s key-value store from its complex event processing part. Queries can be directly expressed as part of statements in the Esper Event Processing Language (EPL).

Further, the DDIS group also released the testng4esper library at Maven central. The testng4esper library facilitates writing TestNG tests for programs using the Esper complex event processing engine.

]]>
Library streams-esper released on Maven central. https://www.uzh.ch/blog/ifi-ddis/2013/06/14/library-streams-esper-released-on-maven-central/ Fri, 14 Jun 2013 06:29:41 +0000 http://www.uzh.ch/blog/ifi-ddis/?p=130 The DDIS group released the streams-esper library on Maven central. The streams-esper library allows to use the Esper complex event processing engine with the Streams platform for stream processing. It enabled the ViSTA-TV engine to query and match complex events on streams of IPTV features. The ViSTA-TV project is hence in the position to compute real-time vieweship statistics and content-based recommendations.

]]>