UCSC-SOE-08-14: Sketch-based Summarization of Ordered XML Streams

Veronica Mayorga, Neoklis Polyzotis
07/07/2008 09:00 AM
Computer Science
XML streams, such as RSS feeds or complex event streams, are becoming increasingly pervasive as they provide the foundation for a wide range of emerging applications. An important problem in this context is the realization of continuous queries that can support on-line monitoring and analysis of the streaming XML data. The evaluation of exact results, however, can be prohibitively expensive for the resource-restricted environment of a streaming application. This leads naturally to the use of approximation techniques that can provide an on-demand estimate for the result of a continuous XML query.

In this paper, we introduce a new technique for approximately answering a complex aggregate query over an XML stream using limited memory. The main novelty of the proposed technique is that it supports XML queries with any combination of the common XPath axes, namely, ancestor, descendant, parent, child, following, preceding, following-sibling, and preceding-sibling. At the heart of our method lies an efficient transform that reduces a continuous XML query to an equi-join query over relational streams. We detail the transform and discuss its integration with randomized sketches as a basic mechanism to estimate the result of the XML query. We further enhance this mechanism with structural sieving, a technique that takes advantage of the XML data and query characteristics in order to improve the accuracy of the sketch-based approximation. We present an extensive experimental study on real-life and synthetic data sets that validates the effectiveness of our approach and demonstrates its advantages over existing techniques.

UCSC-SOE-08-14