UCSC-SOE-09-01: Robust and Efficient Evaluation of Rank Joins

Jonathan Finger, Neoklis Polyzotis
01/02/2009 09:00 AM
Computer Science
In the rank join problem we are given a relational join $R_1 \Join R_2$ and a function that assigns numeric scores to the join tuples, and the goal is to return the tuples with the highest score. This problem lies at the core of processing top-k SQL queries, and recent studies have introduced specialized operators that solve the rank join problem by accessing only a subset of the input tuples. A desirable property for such operators is robustness, i.e., their performance should remain stable across different input characteristics. However, a recent theoretical study has shown that existing rank join operators are not robust even though they have been shown to perform well in practice. The same study proposed the $\pbrjrrfr$ operator that was proved to be robust, but its performance was not tested empirically and in fact it was hinted that its complexity can be high. Thus, the following important question is raised: Is it possible to design a rank join operator that is both robust and efficient?

In this paper we provide an answer to this challenging question. We perform an empirical study of $\pbrjrrfr$ and show that its performance is not good in practice. Using the insights gained by the study, we develop the novel $\pbrjstarofr$ operator that addresses the efficiency bottlenecks of $\pbrjrrfr$. We prove that $\pbrjstarofr$ is robust in general and more specifically that it never performs more I/O than $\pbrjrrfr$. $\pbrjstarofr$ is the first operator that possesses these properties and is thus of interest in the theoretical study of rank join operators. We further identify cases where the overhead of $\pbrjstarofr$ becomes significant, and propose the $\pbrjstarafr$ operator that automatically adapts its overhead to the characteristics of the input. An extensive experimental study validates the effectiveness of the new operators and demonstrates that they offer significant performance improvements (up to an order of magnitude) over the state-of-the-art.

UCSC-SOE-09-01