Skip to content

[RFC] Add the ability to give HTTP filters names, and a utility to get them.#44356

Open
nshipilov wants to merge 206 commits intoenvoyproxy:mainfrom
nshipilov:main
Open

[RFC] Add the ability to give HTTP filters names, and a utility to get them.#44356
nshipilov wants to merge 206 commits intoenvoyproxy:mainfrom
nshipilov:main

Conversation

@nshipilov
Copy link
Copy Markdown

@nshipilov nshipilov commented Apr 10, 2026

Commit Message: Add the ability to give HTTP filters names, and a utility to get them (or fall back to RTTI if not overwritten).

Additional Description:

At Uber, we have added logging in filter_manager.h/cc for filter operations (ctors, dtors, OnDestroy, encodeHeaders, decodeHeaders) to debug slow p99.9 forward latencies in our filter chains.

To do so, we need a way to identify filters, and this is my proposal for how to do so. It allows gradual onboarding with a reasonable fallback.

Additionally, we would like some guidance on whether or not you would be open to adding logging similar to what we have upstream.

Currently, we have a configurable "slowness" level in runtime configuration, and filter_manager emits logs if specific functions cross that threshold. On top of this, we could emit stats about these latencies in the filter_manager.

Let me know what you folks think, and if the direction is agreeable, I can put up a pull request. Thanks!

Risk Level: Low

Testing: Unit tests.

Docs Changes: None

Release Notes: None

Platform Specific Features: None

@repokitteh-read-only
Copy link
Copy Markdown

Hi @nshipilov, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #44356 was opened by nshipilov.

see: more, trace.

@nshipilov nshipilov had a problem deploying to external-contributors April 10, 2026 00:00 — with GitHub Actions Error
@phlax phlax requested a review from agrawroh April 13, 2026 09:34
@phlax
Copy link
Copy Markdown
Member

phlax commented Apr 13, 2026

could you fix DCO please @nshipilov

nshipilov and others added 25 commits April 13, 2026 16:47
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
<!--
!!!ATTENTION!!!

If you are fixing *any* crash or *any* potential security issue, *do
not*
open a pull request in this repo. Please report the issue via emailing
envoy-security@googlegroups.com where the issue will be triaged
appropriately.
Thank you in advance for helping to keep Envoy secure.

!!!ATTENTION!!!

For an explanation of how to fill out the fields, please see the
relevant section
in
[PULL_REQUESTS.md](https://github.com/envoyproxy/envoy/blob/main/PULL_REQUESTS.md)

!!!ATTENTION!!!

Please check the [use of generative AI
policy](https://github.com/envoyproxy/envoy/blob/main/CONTRIBUTING.md?plain=1#L41).

You may use generative AI only if you fully understand the code. You
need to disclose
this usage in the PR description to ensure transparency.
-->

Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

---------

Signed-off-by: tyxia <tyxia@google.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…xy#43544)

Commit Message: Adds extension common util for parsing AWS eventstream
Additional Description: this PR adds common utility for parsing the AWS
eventstream encoding which is used in various AWS products for streaming
content. More details here
https://smithy.io/2.0/aws/amazon-eventstream.html.
Part of the work for envoyproxy#43492
Risk Level:
Testing: Unit + fuzzy
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes:]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

---------

Signed-off-by: Peter Leng <yleng@pinterest.com>
Signed-off-by: Peter Leng <peterleng1234@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Uses of sed and xargs assumed GNU extensions. Fix the commands so that
they'll work on either BSD or GNU versions of tools.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…igh watermark (envoyproxy#42176)

Commit Message: Add timeout guard for connections stuck above buffer
high watermark
* extend listener and cluster APIs with
`per_connection_buffer_high_watermark_timeout`, allowing operators to
cap how long downstream or upstream connections can remain at/above
their configured buffer high watermark
* close connections when their buffers stay at/above the watermark past
the configured timeout

Background: In our large (internal) multi-tenant use case where Envoy
runs as a host-agent shared by all the pods on the host, a buggy pods
can accept connections but fail to drain them, leaving Envoy buffers
full until overload-manager kicks in or OOMs. This change clamps such
stalled connections once they remain fully buffered for the configured
duration.

Additional Description: N/A
Risk Level: Low
Testing: Unit tests
Docs Changes: N/A
Release Notes: N/A
Platform-Specific Features: N/A

---------

Signed-off-by: Ronak Jain <ronakjainc@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Add the filter state key to the IS_ENVOY_BUG messages to help with
debugging

Risk Level: low
Testing: unit tests

Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…rap DM (envoyproxy#43962)

## Description

This PR added listener lifecycle callbacks including
`on_listener_add_or_update` and `on_listener_removal` to the dynamic
module bootstrap extension enabling modules to respond to listeners
lifecycle events.

---

**Commit Message:** dynamic_modules: add listener lifecycle event
callbacks to the bootstrap DM
**Additional Description:** Added listener lifecycle callbacks including
`on_listener_add_or_update` and `on_listener_removal` for Bootstrap.
**Risk Level:** Low
**Testing:** Added Tests
**Docs Changes:** Added
**Release Notes:** N/A

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…#43703)

## Description

This PR adds support for writing new upstream extensions using Dynamic
Modules.

---

**Commit Message:** dynamic_modules: add dynamic modules support for
upstream extensions
**Additional Description:** Adds support for writing new upstream
extensions using Dynamic Modules.
**Risk Level:** Low
**Testing:** Added Unit + Integration Tests
**Docs Changes:** Added
**Release Notes:** Added

---------

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…ic modules ABI (envoyproxy#43964)

## Description

This PR added a process-wide shared data registry to the dynamic modules
ABI.

---

**Commit Message:** dynamic_modules: add a process-wide shared data
registry to the dynamic modules ABI
**Additional Description:** Added a process-wide shared data registry to
the dynamic modules ABI.
**Risk Level:** Low
**Testing:** Added Tests
**Docs Changes:** Added
**Release Notes**: Added

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…oxy#43965)

The clang toolchain is now hermetic and
automatically downloaded by Bazel, so bazel/setup_clang.sh no longer
exists. Updated `bazel/README.md` to reflect the current setup process,
fixed a stale comment in `.bazelrc`, and removed outdated shell_commands
in `.bazelci/presubmit.yml`.

Signed-off-by: stekole <stefan@sandnetworks.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…nvoyproxy#43957)

Commit Message: I believe that when you check out a repo and run tests,
they should work. Tests that do not make sense on the platform you are
running should be skipped.

In this case the underlying system throws if you don't have the right
processor, so I changed the test to just catch it, check the exception
string is what's expected, and stop.

Additional Description:
Risk Level: low (test-only)
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

---------

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…nvoyproxy#43952)

## Summary
- Document the evaluation order of rewrite rules within a matched
`local_reply_config` mapper: `body` → `headers_to_add` → `status_code` →
`body_format_override`
- Explain that `%RESPONSE_CODE%` resolves to different values in
`headers_to_add` (original code) vs `body_format_override` (overridden
code) due to this ordering
- Provide a workaround: capture the original code in a response header
and reference it via `%RESP(header-name)%`

Verified against `source/common/local_reply/local_reply.cc` lines
103-132 (matchAndRewrite) and lines 178-207 (rewrite).

Fixes envoyproxy#37577

Risk Level: Low (docs only)
Testing: N/A
Docs Changes: Added evaluation order documentation to local_reply.rst
Release Notes: N/A

Signed-off-by: Kit <kit@kovan.dev>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Kit <kit@kovan.dev>
Signed-off-by: kovan <xaum.io@gmail.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…y#43987)

On macOS, the primary group for regular users is GID 20 (staff).
run_envoy_docker.sh always unconditionally overwrites USER_GID with the
result of id -g, then passes that value into the Docker container
entrypoint. The entrypoint runs groupmod -g "$USER_GID" envoybuild to
remap the envoybuild group to match the host user's GID so that
volume-mounted file permissions work correctly. However, GID 20 is
already assigned to a different group inside the Linux-based build
container (e.g. dialout), so groupmod fails with:
```
groupmod: GID '20' already exists
```
Because USER_GID was unconditionally assigned, any attempt to override
it by prefixing the command (e.g. USER_GID=1000 ./ci/run_envoy_docker.sh
bash) had no effect — the script always clobbered the value.

Fix

Changed the USER_UID and USER_GID assignments in run_envoy_docker.sh to
use the existing environment variable value if already set, falling back
to id -u / id -g only when not provided:
```
USER_UID="${USER_UID:-$(id -u)}"
USER_GID="${USER_GID:-$(id -g)}"
```
This allows macOS users to override the GID with one that doesn't
conflict with groups already present inside the container:
```
USER_GID=1000 ./ci/run_envoy_docker.sh bash
```

Signed-off-by: Juan Manuel Ollé <jolle@salesforce.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Signed-off-by: dependency-envoy[bot] <148525496+dependency-envoy[bot]@users.noreply.github.com>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Signed-off-by: dependency-envoy[bot] <148525496+dependency-envoy[bot]@users.noreply.github.com>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Signed-off-by: dependency-envoy[bot] <148525496+dependency-envoy[bot]@users.noreply.github.com>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Signed-off-by: dependency-envoy[bot] <148525496+dependency-envoy[bot]@users.noreply.github.com>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
… implementation from Stats::AllocatorImpl to Stats::Allocator (envoyproxy#43968)

A long time ago, all Envoy stats were stored in a large shared memory
block, significantly limiting how many you could get, and the lengths of
their names. This facilitated keeping stat continuity across
parent/child hot-restart. The fixed shared-mem block size was very
limiting and a bit inefficient, so an alternative mechanism of keeping
stats on the heap was added for when hot-restart was not needed. To
switch between them, we used the Allocator interface, and kept two
implementations of that.

Then we removed the shared-memory version in favor of using message
passing to update counters from parent to child during hot restart.

Now it's time to remove that interface; it's making it a bit harder to
reason about guaranteeing that stats users don't call the allocator
directly; it's really an implementation detail of the stats system. So
this is a pure refactor (no functional change) which can help that, and
make it easier to reason about the safety of envoyproxy#43958 .

As removing headers and class names can be a breaking change, temporary
forwarding headers are left behind, temporarily; out-of-repo referencess
should be changed by June.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…yproxy#43974)

**Commit Message**: dynamic_modules: remote source: add caching for
fetched modules

**Motivation**
Follow up from envoyproxy#43818 and envoyproxy#43333
Next PR: `nack_on_cache_miss` support (NACK on cache miss + background
fetch)

**Additional Description**:

`newDynamicModuleFromBytes` already writes fetched modules to a
deterministic path based on SHA256. This PR adds a check before the
async fetch: if the file already exists on disk, load it directly and
skip the HTTP fetch entirely.

It is a simple approach that forms base for the next PR.

**Risk Level**: Low
**Testing**: Added unit tests
**Docs Changes**: NA
**Release Notes**: Added
Platform Specific Features: NA
[Optional Runtime guard:] Not needed
[Optional Fixes #Issue]: NA
[Optional Fixes commit #PR or SHA]: NA
[Optional Deprecated:]: NA
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]
NA

---------

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Commit Message: mcp: add tools/call response transcoding.
Additional Description:
Risk Level: Low
Testing: Unit test and Integration test
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

---------

Signed-off-by: Yilin Guo <guoyilin@google.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…anic safety (envoyproxy#43828)

Fixes envoyproxy#43827

Adds an opt-in `CatchUnwind<F>` wrapper type to the Rust SDK that module
authors can use to catch panics in filter callbacks. When a wrapped
filter panics:

1. The inner filter is immediately dropped (preventing access to
potentially corrupted state).
2. The panic payload is logged via `envoy_log_error!`.
3. The wrapper fails closed — sends a 500 response for HTTP filters,
closes the connection for network filters, and returns `StopIteration`
for all status-returning methods.
4. Subsequent status-returning callbacks on a poisoned filter abort the
process (this should never happen if the fail-closed mechanism works
correctly).
5. Void cleanup callbacks (e.g. `on_destroy`, `on_stream_complete`)
silently skip on a poisoned filter.

Module authors opt in by wrapping their filter at construction time:

```rust
fn new_http_filter(&self, envoy: &mut EHF) -> Box<dyn HttpFilter<EHF>> {
    Box::new(CatchUnwind::new(MyFilter::new()))
}
```

Without the wrapper, panics abort the process as before (the default
Rust behavior at `extern "C"` boundaries). Init and config callbacks are
intentionally not wrapped — panics during startup indicate fatal
misconfiguration and should crash.

**AI Disclosure:** This PR was developed with AI assistance (Claude via
Cursor).

**Risk Level:** Low
**Testing:** Unit tests for HTTP, network, and listener filter panic
paths.
**Docs Changes:** Release note in `changelogs/current.yaml` + dynamic
modules docs update.
**Release Notes:** Rust SDK adds `CatchUnwind` wrapper for opt-in panic
safety in filter callbacks.

---------

Signed-off-by: Conrad Ludgate <conradludgate@gmail.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Co-authored-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…oyproxy#44009)

Commit Message: deps: switch aspect_bazel_lib to download from official
releases

Additional Description:
The bazel-lib published release tarballs include a generated file
`tools/integrity.bzl` which is stubbed out in the regular source tree
(ex:
https://github.com/bazel-contrib/bazel-lib/blob/v2.21.2/tools/integrity.bzl).
If this file is missing, some bazel-lib rules which make use of extra
toolchains will break due to the missing checksums when downloading
binaries.
None of the affected rules are used in Envoy right now, but this can
affect downstream consumers.

Risk Level: low
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

Signed-off-by: Joe Kralicky <joekralicky@gmail.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…t populated (envoyproxy#43876)

Fixing a case where the initial_resource_versions field in the delta-xDS
discovery request isn't being populated correctly when xDS-Failover is
used, and the primary doesn't respond on the first attempt.

Risk Level: low - xDS-Failover is gated by an already existing
`envoy.restart_features.xds_failover_support` and is still experimental.
Testing: Added an integration test.
Docs Changes: N/A
Release Notes: Added.
Platform Specific Features: N/A

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…envoyproxy#43975)

Commit Message: userspace socket: rename some IoHandle functions to be
less confusing
Additional Description:
Userspace IoHandleImpl implements a Network::IoHandle read/write
interface to be exposed to the transport socket. It also implements a
UserSpace::IoHandle interface, exposed to the peer handle, to indicate
to the peer whether its receive buffer can accept more data. Rather
confusingly, this is described as being "writable".

This overloading of terms means that a handle can be blocked for
`write()`s while `isWritable()` returns true, since it is the handle's
_peer_ whose `write()` function can now be called because the peer is
what calls `isWritable()`. Similarly, `isWritable()` means that the
handle is writable by the peer, while `isReadable()` means that the
handle is readable _by the owner_.

I think this would be less confusing by avoiding most references to
"write" in the UserSpace::IoHandle API, except where it refers
specifically to the `write()` function on the same handle:
- `setWriteEnd()` is called by the peer to promise that it will send no
more data -> `setEof()`
- `isPeerShutDownWrite()` is called to determine if the above has
happened -> `hasReceivedEof()`
- `getWriteBuffer()` -> `getReceiveBuffer()`
- `isWritable()` is called to determine if the receive buffer can accept
data -> `canReceiveData()`
- `isPeerWritable()` _actually_ indicates whether write() calls will
succeed -> `isWritable()`

Risk Level: none; renames some symbols
Testing: existing CI
Docs Changes: n/a
Release Notes: n/a

---------

Signed-off-by: Eugene Chan <eugenechan@google.com>
Signed-off-by: pianiststickman <34144687+pianiststickman@users.noreply.github.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…#43934)

Fixed bug where providing an empty inner socket config would cause Envoy
to crash.

Risk Level: Low.
Testing: UT
Docs Changes: n/a
Release Notes: done

---------

Signed-off-by: Tony Allen <txallen@google.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
phlax and others added 4 commits April 13, 2026 16:51
envoyproxy#44348)

…ep (envoyproxy#43715)"

This reverts commit d74384b.

---------

Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…to resolve it

Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
@nshipilov nshipilov requested a deployment to external-contributors April 13, 2026 16:54 — with GitHub Actions Waiting
@nshipilov
Copy link
Copy Markdown
Author

could you fix DCO please @nshipilov

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.