Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
-
Updated
Jun 13, 2024 - Scala
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
A Fast, Declarative, and Extensible ETL Framework for Graph Databases.
Wren Engine is the backbone of the semantic layer - The semantic engine for LLMs, bringing business context to AI agents.
Data API Framework for AI Agents and Data Apps
lakeFS - Data version control for your data lake | Git for data
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Enterprise-grade, production-hardened, serverless data lake on AWS
notebook guide
An Git-like version control file system for data lineage & data collaboration.
🦖 Efficiently evolve your old fixed-length data files into more modern file formats, fully parallelized!
Herd-MDL, a turnkey managed data lake in the cloud. See https://finraos.github.io/herd-mdl/ for more information.
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
wrapper for multiple linkml storage engines (alpha software)
ETL pipeline using Pulumi, AWS services, and Snowflake for automated data flow.
Data Lake on the Edge
Add a description, image, and links to the data-lake topic page so that developers can more easily learn about it.
To associate your repository with the data-lake topic, visit your repo's landing page and select "manage topics."