Skip to content

Commit 395ecdf

Browse files
authored
Merge pull request #1409 from HackTricks-wiki/update_Strategies_for_Analyzing_Native_Code_in_Android_Ap_20250916_124743
Strategies for Analyzing Native Code in Android Applications...
2 parents 438d959 + 98221d1 commit 395ecdf

1 file changed

Lines changed: 146 additions & 0 deletions

File tree

src/generic-methodologies-and-resources/basic-forensic-methodology/malware-analysis.md

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,145 @@ See the Android native reversing page for setup details and log paths:
183183

184184
---
185185

186+
### Android/JNI native string deobfuscation with angr + Ghidra
187+
188+
Some Android malware and RASP-protected apps hide JNI method names and signatures by decoding them at runtime before calling RegisterNatives. When Frida/ptrace instrumentation is killed by anti-debug, you can still recover the plaintext offline by executing the in-binary decoder with angr and then pushing results back into Ghidra as comments.
189+
190+
Key idea: treat the decoder inside the .so as a callable function, execute it on the obfuscated byte blobs in .rodata, and concretize the output bytes up to the first \x00 (C-string terminator). Keep angr and Ghidra using the same image base to avoid address mismatches.
191+
192+
Workflow overview
193+
- Triage in Ghidra: identify the decoder and its calling convention/arguments in JNI_OnLoad and RegisterNatives setup.
194+
- Run angr (CPython3) to execute the decoder for each target string and dump results.
195+
- Annotate in Ghidra: auto-comment decoded strings at each call site for fast JNI reconstruction.
196+
197+
Ghidra triage (JNI_OnLoad pattern)
198+
- Apply JNI datatypes to JNI_OnLoad so Ghidra recognises JNINativeMethod structures.
199+
- Typical JNINativeMethod per Oracle docs:
200+
201+
```c
202+
typedef struct {
203+
char *name; // e.g., "nativeFoo"
204+
char *signature; // e.g., "()V", "()[B"
205+
void *fnPtr; // native implementation address
206+
} JNINativeMethod;
207+
```
208+
- Look for calls to RegisterNatives. If the library constructs the name/signature with a local routine (e.g., FUN_00100e10) that references a static byte table (e.g., DAT_00100bf4) and takes parameters like (encoded_ptr, out_buf, length), that is an ideal target for offline execution.
209+
210+
angr setup (execute the decoder offline)
211+
- Load the .so with the same base used in Ghidra (example: 0x00100000) and disable auto-loading of external libs to keep the state small.
212+
213+
```python
214+
import angr, json
215+
216+
project = angr.Project(
217+
'/path/to/libtarget.so',
218+
load_options={'main_opts': {'base_addr': 0x00100000}},
219+
auto_load_libs=False,
220+
)
221+
222+
ENCODING_FUNC_ADDR = 0x00100e10 # decoder function discovered in Ghidra
223+
224+
def decode_string(enc_addr, length):
225+
# fresh blank state per evaluation
226+
st = project.factory.blank_state()
227+
outbuf = st.heap.allocate(length)
228+
call = project.factory.callable(ENCODING_FUNC_ADDR, base_state=st)
229+
ret_ptr = call(enc_addr, outbuf, length) # returns outbuf pointer
230+
rs = call.result_state
231+
raw = rs.solver.eval(rs.memory.load(ret_ptr, length), cast_to=bytes)
232+
return raw.split(b'\x00', 1)[0].decode('utf-8', errors='ignore')
233+
234+
# Example: decode a JNI signature at 0x100933 of length 5 → should be ()[B
235+
print(decode_string(0x00100933, 5))
236+
```
237+
238+
- At scale, build a static map of call sites to the decoder’s arguments (encoded_ptr, size). Wrappers may hide arguments, so you may create this mapping manually from Ghidra xrefs if API recovery is noisy.
239+
240+
```python
241+
# call_site -> (encoded_addr, size)
242+
call_site_args_map = {
243+
0x00100f8c: (0x00100b81, 0x41),
244+
0x00100fa8: (0x00100bca, 0x04),
245+
0x00100fcc: (0x001007a0, 0x41),
246+
0x00100fe8: (0x00100933, 0x05),
247+
0x0010100c: (0x00100c62, 0x41),
248+
0x00101028: (0x00100c15, 0x16),
249+
0x00101050: (0x00100a49, 0x101),
250+
0x00100cf4: (0x00100821, 0x11),
251+
0x00101170: (0x00100940, 0x101),
252+
0x001011cc: (0x0010084e, 0x13),
253+
0x00101334: (0x001007e9, 0x0f),
254+
0x00101478: (0x0010087d, 0x15),
255+
0x001014f8: (0x00100800, 0x19),
256+
0x001015e8: (0x001008e6, 0x27),
257+
0x0010160c: (0x00100c33, 0x13),
258+
}
259+
260+
decoded_map = { hex(cs): decode_string(enc, sz)
261+
for cs, (enc, sz) in call_site_args_map.items() }
262+
263+
print(json.dumps(decoded_map, indent=2))
264+
with open('decoded_strings.json', 'w') as f:
265+
json.dump(decoded_map, f, indent=2)
266+
```
267+
268+
Annotate call sites in Ghidra
269+
Option A: Jython-only comment writer (use a pre-computed JSON)
270+
- Since angr requires CPython3, keep deobfuscation and annotation separated. First run the angr script above to produce decoded_strings.json. Then run this Jython GhidraScript to write PRE_COMMENTs at each call site (and include the caller function name for context):
271+
272+
```python
273+
#@category Android/Deobfuscation
274+
# Jython in Ghidra 10/11
275+
import json
276+
from ghidra.program.model.listing import CodeUnit
277+
278+
# Ask for the JSON produced by the angr script
279+
f = askFile('Select decoded_strings.json', 'Load')
280+
mapping = json.load(open(f.absolutePath, 'r')) # keys as hex strings
281+
282+
fm = currentProgram.getFunctionManager()
283+
rm = currentProgram.getReferenceManager()
284+
285+
# Replace with your decoder address to locate call-xrefs (optional)
286+
ENCODING_FUNC_ADDR = 0x00100e10
287+
enc_addr = toAddr(ENCODING_FUNC_ADDR)
288+
289+
callsite_to_fn = {}
290+
for ref in rm.getReferencesTo(enc_addr):
291+
if ref.getReferenceType().isCall():
292+
from_addr = ref.getFromAddress()
293+
fn = fm.getFunctionContaining(from_addr)
294+
if fn:
295+
callsite_to_fn[from_addr.getOffset()] = fn.getName()
296+
297+
# Write comments from JSON
298+
for k_hex, s in mapping.items():
299+
cs = int(k_hex, 16)
300+
site = toAddr(cs)
301+
caller = callsite_to_fn.get(cs, None)
302+
text = s if caller is None else '%s @ %s' % (s, caller)
303+
currentProgram.getListing().setComment(site, CodeUnit.PRE_COMMENT, text)
304+
print('[+] Annotated %d call sites' % len(mapping))
305+
```
306+
307+
Option B: Single CPython script via pyhidra/ghidra_bridge
308+
- Alternatively, use pyhidra or ghidra_bridge to drive Ghidra’s API from the same CPython process running angr. This allows calling decode_string() and immediately setting PRE_COMMENTs without an intermediate file. The logic mirrors the Jython script: build callsite→function map via ReferenceManager, decode with angr, and set comments.
309+
310+
Why this works and when to use it
311+
- Offline execution sidesteps RASP/anti-debug: no ptrace, no Frida hooks required to recover strings.
312+
- Keeping Ghidra and angr base_addr aligned (e.g., 0x00100000) ensures that function/data addresses match across tools.
313+
- Repeatable recipe for decoders: treat the transform as a pure function, allocate an output buffer in a fresh state, call it with (encoded_ptr, out_ptr, len), then concretize via state.solver.eval and parse C-strings up to \x00.
314+
315+
Notes and pitfalls
316+
- Respect the target ABI/calling convention. angr.factory.callable picks one based on arch; if arguments look shifted, specify cc explicitly.
317+
- If the decoder expects zeroed output buffers, initialize outbuf with zeros in the state before the call.
318+
- For position-independent Android .so, always supply base_addr so addresses in angr match those seen in Ghidra.
319+
- Use currentProgram.getReferenceManager() to enumerate call-xrefs even if the app wraps the decoder behind thin stubs.
320+
321+
For angr basics, see: [angr basics](../../reversing/reversing-tools-basic-methods/angr/README.md)
322+
323+
---
324+
186325
## Deobfuscating Dynamic Control-Flow (JMP/CALL RAX Dispatchers)
187326

188327
Modern malware families heavily abuse Control-Flow Graph (CFG) obfuscation: instead of a direct jump/call they compute the destination at run-time and execute a `jmp rax` or `call rax`. A small *dispatcher* (typically nine instructions) sets the final target depending on the CPU `ZF`/`CF` flags, completely breaking static CFG recovery.
@@ -283,6 +422,13 @@ adaptixc2-config-extraction-and-ttps.md
283422

284423
- [Unit42 – Evolving Tactics of SLOW#TEMPEST: A Deep Dive Into Advanced Malware Techniques](https://unit42.paloaltonetworks.com/slow-tempest-malware-obfuscation/)
285424
- SoTap: Lightweight in-app JNI (.so) behavior logger – [github.com/RezaArbabBot/SoTap](https://github.com/RezaArbabBot/SoTap)
425+
- Strategies for Analyzing Native Code in Android Applications: Combining Ghidra and Symbolic Execution for Code Decryption and Deobfuscation – [revflash.medium.com](https://revflash.medium.com/strategies-for-analyzing-native-code-in-android-applications-combining-ghidra-and-symbolic-aaef4c9555df)
426+
- Ghidra – [github.com/NationalSecurityAgency/ghidra](https://github.com/NationalSecurityAgency/ghidra)
427+
- angr – [angr.io](https://angr.io/)
428+
- JNI_OnLoad and invocation API – [docs.oracle.com](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#JNJI_OnLoad)
429+
- RegisterNatives – [docs.oracle.com](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#RegisterNatives)
430+
- Tracing JNI Functions – [valsamaras.medium.com](https://valsamaras.medium.com/tracing-jni-functions-75b04bee7c58)
431+
- Native Enrich: Scripting Ghidra and Frida to discover hidden JNI functions – [laripping.com](https://laripping.com/blog-posts/2021/12/20/nativeenrich.html)
286432
- [Unit42 – AdaptixC2: A New Open-Source Framework Leveraged in Real-World Attacks](https://unit42.paloaltonetworks.com/adaptixc2-post-exploitation-framework/)
287433

288434
{{#include ../../banners/hacktricks-training.md}}

0 commit comments

Comments
 (0)