Skip to content

Feature Request: Proactive server query killing based on Query cost or Strategy #18043

@anuragrai16

Description

@anuragrai16

Motivation

Apache Pinot currently has mechanisms to kill runaway queries based on CPU time and memory pressure via QueryResourceAggregator. However, these mechanisms have limitations:

Reactive, not proactive: The background thread (QueryResourceAggregator) samples resource usage periodically (default 100ms). A query scanning billions of entries can cause significant damage before being detected.
Delayed detection: A full table scan query can process 100M+ entries between background thread samples, causing CPU spikes and impacting other queries before termination.
Gap in protection: The existing killMostExpensiveQuery() (memory-based) and killCPUTimeExceedQueries() (CPU-based) work well for memory-intensive and CPU-intensive queries respectively. However, a query doing a massive filter scan may not immediately trigger high memory usage but will consume significant CPU cycles processing entries that ultimately get filtered out.

Problem : In production environments with tables containing billions of rows, a single missing filter predicate or low-selectivity filter can trigger full segment scans. By the time the background thread detects high CPU usage, the query has already consumed significant resources and impacted cluster performance and increased tail latency for most other queries.

To solve this problem, I would like to propose a Proactive Query Killing Framework that terminates queries inline based on scan thresholds, complementing the existing background-based CPU/memory killing. This framework will plug into existing query killing mechanism, just introduce a proactive way to track and terminate the queries instead of relying on overall JVM memory or CPU usage. The strategy will be made pluggable so it can be extended with more cost measurements in the future.

Core Components

QueryCostContext Interface: Exposes real-time query metrics:

getNumEntriesScannedInFilter()
getNumDocsScanned()
getElapsedTimeMs()

QueryKillingStrategy Interface: Pluggable strategy for termination decisions:

  interface QueryKillingStrategy {
      boolean shouldTerminate(QueryCostContext context);
      String getTerminationReason();
  }

Configuration

pinot.server.query.killer.enabled=true
pinot.server.query.killer.threshold.entriesScannedInFilter=100000000
pinot.server.query.killer.threshold.docsScanned=10000000
# Can also be overridden with table specific overrides

####Non-Goals for now (Can be a future prospect)

  • Broker-side query killing (future work)
  • Query fingerprinting / pattern-based blocking
  • Adaptive thresholds based on cluster load

####Similar design in other systems

  • ClickHouse provides comprehensive query complexity settings that limit scan operations inline. eg. max_rows_to_read, max_rows_to_read_leaf, read_overflow_mode

  • Trino provides query.max-scan-physical-bytes to limit data scanned:

"The maximum number of bytes that can be scanned by a query during its execution. When this limit is reached, query processing is terminated to prevent excessive resource usage."

Previous Work

Metadata

Metadata

Assignees

No one assigned

    Labels

    PEP-RequestPinot Enhancement Proposal request to be reviewed.featureNew functionality

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions