Exploring Data Dependence

Data-dependent algorithms perform differently on different datasets.


Above we compare the error of one selected data-dependent algorithm with two data-independent algorithms. The first is Identity, a baseline data-independent algorithm which is the most naive approach to answering the workload queries. The second is HB, one of the best performing data-independent algorithms. Datasets on the x-axis are ordered according to the error of the selected data-dependent algorithm.

By exploring different data-dependent algorithms, we can view the variation of their error rates on different datasets and compare that performance with the best and worst performing data-independent algorithms.