Skip to content

ElfPRStatus.ProcessId returns process group (pr_pgrp) instead of actual PID #1433

@leculver

Description

@leculver

Summary

ElfPRStatus.ProcessId is mapped to PGrp (the pr_pgrp field in the ELF prstatus struct), but pr_pgrp is the process group ID, not the process ID.

This affects all architectures: ElfPRStatusX64, ElfPRStatusX86, ElfPRStatusArm64, ElfPRStatusArm, ElfPRStatusRiscV64, ElfPRStatusLoongArch64.

// Current (wrong):
public uint ProcessId => PGrp;   // pr_pgrp = process group ID
public uint ThreadId => Pid;     // pr_pid = thread/LWP ID

Why it appears to work

createdump in dotnet/runtime has its own bug: it writes Tgid (thread group ID, which equals the PID) into the pr_pgrp field instead of the actual process group:

// dumpwriterelf.cpp:384-386
pr.pr_pid = thread.Tid();      // thread ID — correct
pr.pr_ppid = thread.Ppid();    // parent PID — correct
pr.pr_pgrp = thread.Tgid();   // writes Tgid, should be process group

Since most ClrMD Linux usage goes through CreateSnapshotAndAttach (which uses createdump), PGrp ends up containing the Tgid (≈ PID), and the mapping works by accident.

When it breaks

  • Kernel-generated core dumps (e.g., from a SEGV crash without createdump): pr_pgrp contains the real process group ID, which may differ from the PID (e.g., in Docker containers, shell pipelines, or processes that called setpgid/setsid).
  • Any scenario where PID ≠ PGID: The suppressFree check in DacLibrary.cs (added in Fix crash when disposing DAC after self-attach on Linux (#1282) #1361) compares DataReader.ProcessId against Environment.ProcessId. If the core dump was kernel-generated and PGID ≠ PID, the check would fail and the DAC would be unloaded, re-introducing the crash from CLR crashes soon after CreateRuntime called on Linux #1282.

Impact

  1. CoredumpReader.ProcessId returns wrong value for kernel core dumps
  2. The self-attach crash fix (Fix crash when disposing DAC after self-attach on Linux (#1282) #1361) relies on this value and could fail in edge cases
  3. Any consumer of IDataReader.ProcessId on Linux gets semantically incorrect data for non-createdump cores

Suggested fix

Map ProcessId to Pid for the first/main thread (where pr_pid equals the process ID), or add a dedicated mechanism to track the original target PID through CreateSnapshotAndAttach.

Also consider filing a separate issue in dotnet/runtime for createdump writing Tgid into pr_pgrp instead of the actual process group.

References

  • Linux kernel fill_prstatus: prstatus->pr_pgrp = task_pgrp(p) (actual process group)
  • createdump dumpwriterelf.cpp: pr.pr_pgrp = thread.Tgid() (thread group ID, not process group)
  • ClrMD PR Fix crash when disposing DAC after self-attach on Linux (#1282) #1361: self-attach crash fix that depends on DataReader.ProcessId
  • crashinfo.h:50: pid_t m_tgid; // process group — misleading comment, variable is Tgid not PGID

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions