You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/generic-methodologies-and-resources/basic-forensic-methodology/malware-analysis.md
+146Lines changed: 146 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -183,6 +183,145 @@ See the Android native reversing page for setup details and log paths:
183
183
184
184
---
185
185
186
+
### Android/JNI native string deobfuscation with angr + Ghidra
187
+
188
+
Some Android malware and RASP-protected apps hide JNI method names and signatures by decoding them at runtime before calling RegisterNatives. When Frida/ptrace instrumentation is killed by anti-debug, you can still recover the plaintext offline by executing the in-binary decoder with angr and then pushing results back into Ghidra as comments.
189
+
190
+
Key idea: treat the decoder inside the .so as a callable function, execute it on the obfuscated byte blobs in .rodata, and concretize the output bytes up to the first \x00 (C-string terminator). Keep angr and Ghidra using the same image base to avoid address mismatches.
191
+
192
+
Workflow overview
193
+
- Triage in Ghidra: identify the decoder and its calling convention/arguments in JNI_OnLoad and RegisterNatives setup.
194
+
- Run angr (CPython3) to execute the decoder for each target string and dump results.
195
+
- Annotate in Ghidra: auto-comment decoded strings at each call site for fast JNI reconstruction.
196
+
197
+
Ghidra triage (JNI_OnLoad pattern)
198
+
- Apply JNI datatypes to JNI_OnLoad so Ghidra recognises JNINativeMethod structures.
199
+
- Typical JNINativeMethod per Oracle docs:
200
+
201
+
```c
202
+
typedefstruct {
203
+
char *name; // e.g., "nativeFoo"
204
+
char *signature; // e.g., "()V", "()[B"
205
+
void *fnPtr; // native implementation address
206
+
} JNINativeMethod;
207
+
```
208
+
- Look for calls to RegisterNatives. If the library constructs the name/signature with a local routine (e.g., FUN_00100e10) that references a static byte table (e.g., DAT_00100bf4) and takes parameters like (encoded_ptr, out_buf, length), that is an ideal target for offline execution.
209
+
210
+
angr setup (execute the decoder offline)
211
+
- Load the .so with the same base used in Ghidra (example: 0x00100000) and disable auto-loading of external libs to keep the state small.
# Example: decode a JNI signature at 0x100933 of length 5 → should be ()[B
235
+
print(decode_string(0x00100933, 5))
236
+
```
237
+
238
+
- At scale, build a static map of call sites to the decoder’s arguments (encoded_ptr, size). Wrappers may hide arguments, so you may create this mapping manually from Ghidra xrefs if API recovery is noisy.
239
+
240
+
```python
241
+
# call_site -> (encoded_addr, size)
242
+
call_site_args_map = {
243
+
0x00100f8c: (0x00100b81, 0x41),
244
+
0x00100fa8: (0x00100bca, 0x04),
245
+
0x00100fcc: (0x001007a0, 0x41),
246
+
0x00100fe8: (0x00100933, 0x05),
247
+
0x0010100c: (0x00100c62, 0x41),
248
+
0x00101028: (0x00100c15, 0x16),
249
+
0x00101050: (0x00100a49, 0x101),
250
+
0x00100cf4: (0x00100821, 0x11),
251
+
0x00101170: (0x00100940, 0x101),
252
+
0x001011cc: (0x0010084e, 0x13),
253
+
0x00101334: (0x001007e9, 0x0f),
254
+
0x00101478: (0x0010087d, 0x15),
255
+
0x001014f8: (0x00100800, 0x19),
256
+
0x001015e8: (0x001008e6, 0x27),
257
+
0x0010160c: (0x00100c33, 0x13),
258
+
}
259
+
260
+
decoded_map = { hex(cs): decode_string(enc, sz)
261
+
for cs, (enc, sz) in call_site_args_map.items() }
262
+
263
+
print(json.dumps(decoded_map, indent=2))
264
+
withopen('decoded_strings.json', 'w') as f:
265
+
json.dump(decoded_map, f, indent=2)
266
+
```
267
+
268
+
Annotate call sites in Ghidra
269
+
Option A: Jython-only comment writer (use a pre-computed JSON)
270
+
- Since angr requires CPython3, keep deobfuscation and annotation separated. First run the angr script above to produce decoded_strings.json. Then run this Jython GhidraScript to write PRE_COMMENTs at each call site (and include the caller function name for context):
271
+
272
+
```python
273
+
#@category Android/Deobfuscation
274
+
# Jython in Ghidra 10/11
275
+
import json
276
+
from ghidra.program.model.listing import CodeUnit
277
+
278
+
# Ask for the JSON produced by the angr script
279
+
f = askFile('Select decoded_strings.json', 'Load')
280
+
mapping = json.load(open(f.absolutePath, 'r')) # keys as hex strings
281
+
282
+
fm = currentProgram.getFunctionManager()
283
+
rm = currentProgram.getReferenceManager()
284
+
285
+
# Replace with your decoder address to locate call-xrefs (optional)
Option B: Single CPython script via pyhidra/ghidra_bridge
308
+
- Alternatively, use pyhidra or ghidra_bridge to drive Ghidra’s API from the same CPython process running angr. This allows calling decode_string() and immediately setting PRE_COMMENTs without an intermediate file. The logic mirrors the Jython script: build callsite→function map via ReferenceManager, decode with angr, and set comments.
309
+
310
+
Why this works and when to use it
311
+
- Offline execution sidesteps RASP/anti-debug: no ptrace, no Frida hooks required to recover strings.
312
+
- Keeping Ghidra and angr base_addr aligned (e.g., 0x00100000) ensures that function/data addresses match across tools.
313
+
- Repeatable recipe for decoders: treat the transform as a pure function, allocate an output buffer in a fresh state, call it with (encoded_ptr, out_ptr, len), then concretize via state.solver.eval and parse C-strings up to \x00.
314
+
315
+
Notes and pitfalls
316
+
- Respect the target ABI/calling convention. angr.factory.callable picks one based on arch; if arguments look shifted, specify cc explicitly.
317
+
- If the decoder expects zeroed output buffers, initialize outbuf with zeros in the state before the call.
318
+
- For position-independent Android .so, always supply base_addr so addresses in angr match those seen in Ghidra.
319
+
- Use currentProgram.getReferenceManager() to enumerate call-xrefs even if the app wraps the decoder behind thin stubs.
320
+
321
+
For angr basics, see: [angr basics](../../reversing/reversing-tools-basic-methods/angr/README.md)
Modern malware families heavily abuse Control-Flow Graph (CFG) obfuscation: instead of a direct jump/call they compute the destination at run-time and execute a `jmp rax` or `call rax`. A small *dispatcher* (typically nine instructions) sets the final target depending on the CPU `ZF`/`CF` flags, completely breaking static CFG recovery.
-[Unit42 – Evolving Tactics of SLOW#TEMPEST: A Deep Dive Into Advanced Malware Techniques](https://unit42.paloaltonetworks.com/slow-tempest-malware-obfuscation/)
- Strategies for Analyzing Native Code in Android Applications: Combining Ghidra and Symbolic Execution for Code Decryption and Deobfuscation – [revflash.medium.com](https://revflash.medium.com/strategies-for-analyzing-native-code-in-android-applications-combining-ghidra-and-symbolic-aaef4c9555df)
- Native Enrich: Scripting Ghidra and Frida to discover hidden JNI functions – [laripping.com](https://laripping.com/blog-posts/2021/12/20/nativeenrich.html)
286
432
-[Unit42 – AdaptixC2: A New Open-Source Framework Leveraged in Real-World Attacks](https://unit42.paloaltonetworks.com/adaptixc2-post-exploitation-framework/)
0 commit comments