Science Cast

Blend: A Unified Data Discovery System

Mahdi EsmailoghliOctober 5, 2023 8:53am

Views (846)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Blend: A Unified Data Discovery System

arXivPDFOctober 4, 2023 12:00am

Authors

Mahdi Esmailoghli, Christoph Schnell, Renée J. Miller, Ziawasch Abedjan

Abstract

Data discovery is an iterative and incremental process that necessitates the execution of multiple data discovery queries to identify the desired tables from large and diverse data lakes. Current methodologies concentrate on single discovery tasks such as join, correlation, or union discovery. However, in practice, a series of these approaches and their corresponding index structures are necessary to enable the user to discover the desired tables. This paper presents BLEND, a comprehensive data discovery system that empowers users to develop ad-hoc discovery tasks without the need to develop new algorithms or build a new index structure. To achieve this goal, we introduce a general index structure capable of addressing multiple discovery queries. We develop a set of lower-level operators that serve as the fundamental building blocks for more complex and sophisticated user tasks. These operators are highly efficient and enable end-to-end efficiency. To enhance the execution of the discovery pipeline, we rewrite the search queries into optimized SQL statements to push the data operators down to the database. We demonstrate that our holistic system is able to achieve comparable effectiveness and runtime efficiency to the individual state-of-the-art approaches specifically designed for a single task.

TwitterandLinkedIn

0 comments

Add comment

Blend: A Unified Data Discovery System

Blend: A Unified Data Discovery System

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments