Skip to content
Change the repository type filter

All

    Repositories list

    • A extension for DuckDB, which captures lineage events for executed queries
      Python
      MIT License
      25300Updated Apr 15, 2026Apr 15, 2026
    • An Open Standard for lineage metadata collection
      Java
      Apache License 2.0
      459000Updated Apr 11, 2026Apr 11, 2026
    • spark

      Public
      Apache Spark - A unified analytics engine for large-scale data processing
      Scala
      Apache License 2.0
      29k200Updated Apr 11, 2026Apr 11, 2026
    • hadoop

      Public
      Apache Hadoop
      Java
      Apache License 2.0
      9.2k000Updated Apr 11, 2026Apr 11, 2026
    • ClickHouse
      C++
      Apache License 2.0
      8.3k000Updated Apr 11, 2026Apr 11, 2026
    • hive

      Public
      Apache Hive
      Java
      Apache License 2.0
      4.8k000Updated Apr 11, 2026Apr 11, 2026
    • DuckDB extension allowing to connect to Apache Hive Metastore and query the data inside, like a native DuckDB catalog.
      C++
      MIT License
      0100Updated Apr 9, 2026Apr 9, 2026
    • marquez

      Public
      Collect, aggregate, and visualize a data ecosystem's metadata
      Java
      Apache License 2.0
      3941521Updated Apr 1, 2026Apr 1, 2026
    • ilum-tilt

      Public
      Tilt-based local development setup for the Ilum data lakehouse platform
      Starlark
      0120Updated Mar 31, 2026Mar 31, 2026
    • Bigtop Manager is a modern, AI-driven web application designed to simplify the complexity of bigdata cluster management.
      Java
      Apache License 2.0
      47000Updated Mar 30, 2026Mar 30, 2026
    • Python
      166000Updated Mar 18, 2026Mar 18, 2026
    • ilum-cli

      Public
      0100Updated Mar 12, 2026Mar 12, 2026
    • doc

      Public
      Ilum - Apache Spark cluster on Kubernetes
      1710Updated Feb 13, 2026Feb 13, 2026
    • DuckDB JDBC Driver
      C++
      MIT License
      70000Updated Feb 13, 2026Feb 13, 2026
    • bigtop

      Public
      Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the l…
      Groovy
      Apache License 2.0
      526000Updated Jan 30, 2026Jan 30, 2026
    • A modern, multi-cloud object storage browser for JupyterLab 4. Unified interface to browse, manage, and edit files across AWS S3, MinIO, and S3-compatible stora…
      TypeScript
      Apache License 2.0
      0200Updated Jan 22, 2026Jan 22, 2026
    • airflow

      Public
      Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
      Python
      Apache License 2.0
      17k000Updated Jan 2, 2026Jan 2, 2026
    • gravitino

      Public
      World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
      Java
      Apache License 2.0
      807000Updated Jan 1, 2026Jan 1, 2026
    • Open, Multi-modal Catalog for Data & AI
      Java
      Apache License 2.0
      609000Updated Dec 24, 2025Dec 24, 2025
    • delta

      Public
      An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
      Scala
      Apache License 2.0
      2.1k000Updated Dec 24, 2025Dec 24, 2025
    • minio

      Public
      High Performance, Kubernetes Native Object Storage
      Go
      GNU Affero General Public License v3.0
      7.4k000Updated Dec 24, 2025Dec 24, 2025
    • druid

      Public
      Apache Druid: a high performance real-time analytics database.
      Java
      Apache License 2.0
      3.8k000Updated Dec 24, 2025Dec 24, 2025
    • Jupyter magics and kernels for working with remote Spark clusters
      Python
      Other
      449000Updated Dec 24, 2025Dec 24, 2025
    • Bitnami Helm Charts
      Smarty
      Other
      10k000Updated Dec 24, 2025Dec 24, 2025
    • Bitnami container images
      Shell
      Other
      6.8k000Updated Dec 24, 2025Dec 24, 2025
    • sdp-ui

      Public
      Spark Declarative Pipelines UI - Standalone app for running sdp
      Apache License 2.0
      0100Updated Oct 15, 2025Oct 15, 2025
    • Jupyter Notebook
      Apache License 2.0
      6300Updated Jun 12, 2025Jun 12, 2025
    • Apache Pinot (Incubating) - A realtime distributed OLAP datastore
      Java
      Apache License 2.0
      1.5k0025Updated Mar 27, 2025Mar 27, 2025
    • iceberg

      Public
      Apache Iceberg
      Java
      Apache License 2.0
      3.2k000Updated Dec 24, 2024Dec 24, 2024
    • REST job server for Apache Spark
      Scala
      Other
      974100Updated Dec 24, 2024Dec 24, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.