You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ElfPRStatus.ProcessId is mapped to PGrp (the pr_pgrp field in the ELF prstatus struct), but pr_pgrp is the process group ID, not the process ID.
This affects all architectures: ElfPRStatusX64, ElfPRStatusX86, ElfPRStatusArm64, ElfPRStatusArm, ElfPRStatusRiscV64, ElfPRStatusLoongArch64.
// Current (wrong):publicuintProcessId=> PGrp;// pr_pgrp = process group IDpublicuintThreadId=> Pid;// pr_pid = thread/LWP ID
Why it appears to work
createdump in dotnet/runtime has its own bug: it writes Tgid (thread group ID, which equals the PID) into the pr_pgrp field instead of the actual process group:
// dumpwriterelf.cpp:384-386
pr.pr_pid = thread.Tid(); // thread ID — correct
pr.pr_ppid = thread.Ppid(); // parent PID — correct
pr.pr_pgrp = thread.Tgid(); // writes Tgid, should be process group
Since most ClrMD Linux usage goes through CreateSnapshotAndAttach (which uses createdump), PGrp ends up containing the Tgid (≈ PID), and the mapping works by accident.
When it breaks
Kernel-generated core dumps (e.g., from a SEGV crash without createdump): pr_pgrp contains the real process group ID, which may differ from the PID (e.g., in Docker containers, shell pipelines, or processes that called setpgid/setsid).
Any consumer of IDataReader.ProcessId on Linux gets semantically incorrect data for non-createdump cores
Suggested fix
Map ProcessId to Pid for the first/main thread (where pr_pid equals the process ID), or add a dedicated mechanism to track the original target PID through CreateSnapshotAndAttach.
Also consider filing a separate issue in dotnet/runtime for createdump writing Tgid into pr_pgrp instead of the actual process group.
References
Linux kernel fill_prstatus: prstatus->pr_pgrp = task_pgrp(p) (actual process group)
createdump dumpwriterelf.cpp: pr.pr_pgrp = thread.Tgid() (thread group ID, not process group)
Summary
ElfPRStatus.ProcessIdis mapped toPGrp(thepr_pgrpfield in the ELFprstatusstruct), butpr_pgrpis the process group ID, not the process ID.This affects all architectures:
ElfPRStatusX64,ElfPRStatusX86,ElfPRStatusArm64,ElfPRStatusArm,ElfPRStatusRiscV64,ElfPRStatusLoongArch64.Why it appears to work
createdumpin dotnet/runtime has its own bug: it writesTgid(thread group ID, which equals the PID) into thepr_pgrpfield instead of the actual process group:Since most ClrMD Linux usage goes through
CreateSnapshotAndAttach(which usescreatedump),PGrpends up containing the Tgid (≈ PID), and the mapping works by accident.When it breaks
pr_pgrpcontains the real process group ID, which may differ from the PID (e.g., in Docker containers, shell pipelines, or processes that calledsetpgid/setsid).suppressFreecheck inDacLibrary.cs(added in Fix crash when disposing DAC after self-attach on Linux (#1282) #1361) comparesDataReader.ProcessIdagainstEnvironment.ProcessId. If the core dump was kernel-generated and PGID ≠ PID, the check would fail and the DAC would be unloaded, re-introducing the crash from CLR crashes soon after CreateRuntime called on Linux #1282.Impact
CoredumpReader.ProcessIdreturns wrong value for kernel core dumpsIDataReader.ProcessIdon Linux gets semantically incorrect data for non-createdump coresSuggested fix
Map
ProcessIdtoPidfor the first/main thread (wherepr_pidequals the process ID), or add a dedicated mechanism to track the original target PID throughCreateSnapshotAndAttach.Also consider filing a separate issue in dotnet/runtime for createdump writing
Tgidintopr_pgrpinstead of the actual process group.References
fill_prstatus:prstatus->pr_pgrp = task_pgrp(p)(actual process group)dumpwriterelf.cpp:pr.pr_pgrp = thread.Tgid()(thread group ID, not process group)DataReader.ProcessIdcrashinfo.h:50:pid_t m_tgid; // process group— misleading comment, variable is Tgid not PGID