Apache Pinot is a real-time distributed OLAP datastore for low-latency analytics over streaming and batch data. Core runtime roles: broker (query routing), server (segment storage/execution), controller (cluster metadata/management), minion (async tasks).
| Directory | Purpose |
|---|---|
pinot-broker |
Broker query planning and scatter-gather |
pinot-controller |
Controller APIs, table/segment metadata, Helix management |
pinot-server |
Server query execution, segment loading, indexing |
pinot-minion |
Background tasks (segment conversion, purge, etc.) |
pinot-common / pinot-spi |
Shared utils, config, and SPI interfaces |
pinot-segment-local / pinot-segment-spi |
Segment generation, indexes, storage |
pinot-query-planner / pinot-query-runtime |
Multi-stage query engine (MSQE) |
pinot-connectors |
External tooling to connect to Pinot |
pinot-plugins |
All Pinot plugins (input formats, filesystems, stream/batch ingestion, metrics, etc.) |
pinot-tools |
CLI and quickstart scripts |
pinot-integration-tests |
End-to-end validation suites |
pinot-distribution |
Packaging artifacts |
- pinot-input-format: input format plugin family.
pinot-arrow: Apache Arrow input format support.pinot-avro: Avro input format support.pinot-avro-base: shared Avro utilities and base classes.pinot-clp-log: CLP log input format support.pinot-confluent-avro: Confluent Schema Registry Avro input support.pinot-confluent-json: Confluent Schema Registry JSON input support.pinot-confluent-protobuf: Confluent Schema Registry Protobuf input support.pinot-orc: ORC input format support.pinot-json: JSON input format support.pinot-parquet: Parquet input format support.pinot-csv: CSV input format support.pinot-thrift: Thrift input format support.pinot-protobuf: Protobuf input format support.
- pinot-file-system: filesystem plugin family.
pinot-adls: Azure Data Lake Storage (ADLS) filesystem support.pinot-hdfs: Hadoop HDFS filesystem support.pinot-gcs: Google Cloud Storage filesystem support.pinot-s3: Amazon S3 filesystem support.
- pinot-batch-ingestion: batch ingestion plugin family.
pinot-batch-ingestion-common: shared batch ingestion APIs and utilities.pinot-batch-ingestion-spark-base: shared Spark ingestion base classes.pinot-batch-ingestion-spark-3: Spark 3 ingestion implementation.pinot-batch-ingestion-hadoop: Hadoop MapReduce ingestion implementation.pinot-batch-ingestion-standalone: standalone batch ingestion implementation.
- pinot-stream-ingestion: stream ingestion plugin family.
pinot-kafka-base: shared Kafka ingestion base classes.pinot-kafka-3.0: Kafka 3.x ingestion implementation.pinot-kafka-4.0: Kafka 4.x ingestion implementation.pinot-kinesis: AWS Kinesis ingestion implementation.pinot-pulsar: Apache Pulsar ingestion implementation.
- pinot-minion-tasks: minion task plugin family.
pinot-minion-builtin-tasks: built-in minion task implementations.
- pinot-metrics: metrics reporter plugin family.
pinot-dropwizard: Dropwizard Metrics reporter implementation.pinot-yammer: Yammer Metrics reporter implementation.pinot-compound-metrics: compound metrics implementation.
- pinot-segment-writer: segment writer plugin family.
pinot-segment-writer-file-based: file-based segment writer implementation.
- pinot-segment-uploader: segment uploader plugin family.
pinot-segment-uploader-default: default segment uploader implementation.
- pinot-environment: environment provider plugin family.
pinot-azure: Azure environment provider implementation.
- pinot-timeseries-lang: time series language plugin family.
pinot-timeseries-m3ql: M3QL language plugin implementation.
- assembly-descriptor: Maven assembly descriptor for plugin packaging.
- JDK: Use JDK 11+ (CI runs 11/21); code targets Java 11.
- Default build:
./mvnw clean install - Fast dev build:
./mvnw verify -Ppinot-fastdev - Full binary/shaded build:
./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar - Build a module with deps:
./mvnw -pl pinot-server -am test - Single test:
./mvnw -pl pinot-segment-local -Dtest=RangeIndexTest test - Single integration test:
./mvnw -pl pinot-integration-tests -am -Dtest=OfflineClusterIntegrationTest -Dsurefire.failIfNoSpecifiedTests=false test - Quickstart (after build):
build/bin/quick-start-batch.sh
- Run
./mvnw spotless:applyto auto-format code. - Run
./mvnw checkstyle:checkto validate style. Checkstyle config is inconfig/checkstyle.xml. - Run
./mvnw license:formatto add license headers to new files. - Run
./mvnw license:checkto validate license headers. - Always use the Maven wrapper (
./mvnw) rather than a systemmvn.
- Add class-level Javadoc for new classes; describe behavior and thread-safety.
- Use Javadoc comments (
/** ... */or///syntax); code targets Java 11. - Keep Apache 2.0 license headers on all new source files.
- Preserve backward compatibility across mixed-version broker/server/controller.
- Prefer imports over fully qualified class names (e.g., use
import com.foo.Barand refer toBar, notcom.foo.Barinline). - Prefer targeted unit tests; use integration tests when behavior crosses roles.
Before pushing a commit, always run the following checks on the affected modules and fix any failures:
./mvnw spotless:apply -pl <module>— auto-format code../mvnw checkstyle:check -pl <module>— validate style conformance../mvnw license:format -pl <module>— add missing license headers to new files../mvnw license:check -pl <module>— verify all files have correct license headers.
Do not push until all four checks pass cleanly.
- Query changes often touch broker planning and server execution; verify both.
- Segment/index changes usually live under
pinot-segment-localandpinot-segment-spi. - Config or API changes should update relevant configs and docs where applicable.
After completing any coding task (bug fix, feature, refactor, etc.), you MUST run the code-reviewer agent before presenting the work as done. This is non-negotiable.
- Pass ONLY the review scope and a one-line change description. Do NOT pass your analysis, reasoning, or opinions — the reviewer must judge the code independently.
- Example invocation:
"Review unstaged changes in pinot-broker. Change: added timeout to scatter-gather calls." - If the reviewer finds CRITICAL issues, fix them before proceeding. MAJOR issues should be fixed unless you have strong justification. MINOR issues are at your discretion.
- Do not skip the review even if the change seems trivial.
- This is a large multi-module Maven project. Building the entire project takes a long time — prefer building only the modules you need with
-pl <module> -am. - When running tests, use
-Dtest=ClassNameto run a specific test class rather than the full suite. - Mixed-version compatibility matters — do not break wire protocols or serialization formats without careful consideration.