Reproducible Papers

ACM SIGMOD 2019 Reproducible Papers

Cache-oblivious High-performance Similarity Join

by Martin Perdacher, Claudia Plant, Christian Böhm

Abstract:

A similarity join combines vectors based on a distance condition. Typically, such algorithms apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this paper, we propose to refine the pairs in an order defined by a space-filling curve which dramatically improves data locality. Modern multi-core processors are supported by a deep memory hierarchy including RAM, various levels of cache, and registers. The space-filling curve makes our proposed algorithm cache-oblivious to fully exploit this memory hierarchy. Our novel Fast General Form (FGF) Hilbert-curve solves a number of limitations of well-known space-filling curves: it is non-recursive, not restricted to traverse squares, and has a constant time and space complexity per loop iteration. As we demonstrate the easy transformation from conventional into cache-oblivious loops we believe that many complex database operators can be transformed systematically into cache-oblivious parallel algorithms.

Verified by: Subarna Chatterjee, Harvard University

ACM SIGMOD 2019 Reproducible Papers

Cache-oblivious High-performance Similarity Join

Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers

Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances

Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search?

Raha: A Configuration-Free Error Detection System

A Scalable Index for Top-k Subtree Similarity Queries

Experimental Analysis of Streaming Algorithms for Graph Partitioning

Efficiently Answering Regular Simple Path Queries on Large Labeled Networks

GPU-based Graph Traversal on Compressed Graphs

ACM SIGMOD 2018 Reproducible Papers

Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

Adaptive Optimization of Very Large Join Queries

Data Citation: Giving Credit Where Credit is Due

A General and Efficient Querying Method for Learning to Hash

EKTELO: A Framework for Defining Differentially-Private Computations

Robust Entity Resolution using Random Graphs

AHEAD: Adaptable Data Hardening for On-the-Fly Hardware Error Detection during Database Query Processing

ZigZag: Supporting Similarity Queries on Vector Space Models

ACM SIGMOD 2017 Reproducible Papers

Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler

Cicada: Dependably Fast Multi-Core In-Memory Transactions

Foofah: Transforming Data By Example

QIRANA: A Framework for Scalable Query Pricing

Data Canopy: Accelerating Exploratory Statistical Analysis

Debunking the Myths of Influence Maximization: An In-Depth Benchmarking Study

A General-Purpose Counting Filter: Making Every Bit Count

Transaction Repair for Multi-Version Concurrency Control

ACM SIGMOD 2016 Reproducible Papers

Generating Preview Tables for Entity Graphs

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures

ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks

FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory

Micro-architectural Analysis of In-memory OLTP

How to Architect a Query Compiler

Principled Evaluation of Differentially Private Algorithms using DPBench

Adaptive Indexing over Encrypted Numeric Data

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users?

SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment

Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets

Diversified Top-k Subgraph Querying in a Large Graph

UpBit: Scalable In-Memory Updatable Bitmap Indexing

ACM SIGMOD 2015 Reproducible Papers

k-Shape: Efficient and Accurate Clustering of Time Series

FOEDUS: OLTP Engine for a Thousand Cores and NVRAM

Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems

Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time

Holistic Indexing in Main-memory Column-stores

Minimum Spanning Trees in Temporal Graphs

Cache-Efficient Aggregation: Hashing Is Sorting

Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation

Efficient Enumeration of Maximal k-Plexes

Collaborative Access Control in WebdamLog