Summary
Between the 21st March 2025 and the 14th May 2025 (commits 6016c14 and 8eb3567), the pgai repository was vulnerable to an attack allowing the exfiltration of all secrets used in one workflow. In particular, the GITHUB_TOKEN with write permissions for the repository, allowing an attacker to tamper with all aspects of the repository, including pushing arbitrary code and releases.
After conducting a comprehensive audit of all workflow executions and repository activity, we found no evidence that this vulnerability was exploited. The integrity of the pgai codebase remains intact.
Details
The .github/workflows/huggingface-dataset.yml used the pull_request_target event trigger, checked out the head.sha of the pull request, and executed the scripts/generate_huggingface_dataset.py python script. The pull_request_target runs the workflow file from the base repository, but with full access to the repository's secrets, and with a GITHUB_TOKEN with full read/write access to the repository.
This results in the execution of untrusted code in a pull request from a fork, and for the ability to exfiltrate the GITHUB_TOKEN, and the HUGGINGFACE_HUB_TIMESCALE_TOKEN secrets. Because the GITHUB_TOKEN has write access to the repository, it can be used to modify many objects in the repository, posing a significant supply-chain risk, as it could be used to push arbitrary code to the repository.
Impact
The potential impact of this vulnerability is large. An attacker could have used it to:
- poison the pgai codebase
- publish malicious releases of pgai on GitHub
- publish malicious releases of pgai on PyPI
We have found no evidence of the exploitation of this vulnerability. The exploitation would be rooted in an action run of the huggingface-dataset.yml workflow. An audit of all executions of the vulnerable workflow reveal that it was not exploited by a malicious party. The permissions of the GITHUB_TOKEN would allow a malicious actor to delete a workflow run, a further audit of the GitHub Audit Log revealed that no workflow runs were deleted. We do not believe that the integrity of the pgai codebase has been compromised.
Timeline
The vulnerability was reported by a team of security researchers on the 14th May 2025. Within hours steps were taken to mitigate the vulnerability.
The vulnerability was disclosed on the 17th June 2025.
Mitigation
The .github/workflows/huggingface-dataset.yml workflow was fixed to close the vulnerability in #742. The fixes include switching from pull_request_target to pull_request, and explicitly reducing the scope of the GITHUB_TOKEN token to read only access. The GITHUB_TOKEN expires within 24 hours of job execution, so it did not need to be rotated. The HUGGINGFACE_HUB_TIMESCALE_TOKEN was rotated.
Credits
Kindly reported by @darryk10 @AlbertoPellitteri @loresuso
Summary
Between the 21st March 2025 and the 14th May 2025 (commits 6016c14 and 8eb3567), the pgai repository was vulnerable to an attack allowing the exfiltration of all secrets used in one workflow. In particular, the
GITHUB_TOKENwithwritepermissions for the repository, allowing an attacker to tamper with all aspects of the repository, including pushing arbitrary code and releases.After conducting a comprehensive audit of all workflow executions and repository activity, we found no evidence that this vulnerability was exploited. The integrity of the pgai codebase remains intact.
Details
The
.github/workflows/huggingface-dataset.ymlused thepull_request_targetevent trigger, checked out thehead.shaof the pull request, and executed thescripts/generate_huggingface_dataset.pypython script. Thepull_request_targetruns the workflow file from the base repository, but with full access to the repository's secrets, and with aGITHUB_TOKENwith full read/write access to the repository.This results in the execution of untrusted code in a pull request from a fork, and for the ability to exfiltrate the
GITHUB_TOKEN, and theHUGGINGFACE_HUB_TIMESCALE_TOKENsecrets. Because theGITHUB_TOKENhas write access to the repository, it can be used to modify many objects in the repository, posing a significant supply-chain risk, as it could be used to push arbitrary code to the repository.Impact
The potential impact of this vulnerability is large. An attacker could have used it to:
We have found no evidence of the exploitation of this vulnerability. The exploitation would be rooted in an action run of the
huggingface-dataset.ymlworkflow. An audit of all executions of the vulnerable workflow reveal that it was not exploited by a malicious party. The permissions of theGITHUB_TOKENwould allow a malicious actor to delete a workflow run, a further audit of the GitHub Audit Log revealed that no workflow runs were deleted. We do not believe that the integrity of the pgai codebase has been compromised.Timeline
The vulnerability was reported by a team of security researchers on the 14th May 2025. Within hours steps were taken to mitigate the vulnerability.
The vulnerability was disclosed on the 17th June 2025.
Mitigation
The
.github/workflows/huggingface-dataset.ymlworkflow was fixed to close the vulnerability in #742. The fixes include switching frompull_request_targettopull_request, and explicitly reducing the scope of theGITHUB_TOKENtoken to read only access. TheGITHUB_TOKENexpires within 24 hours of job execution, so it did not need to be rotated. TheHUGGINGFACE_HUB_TIMESCALE_TOKENwas rotated.Credits
Kindly reported by @darryk10 @AlbertoPellitteri @loresuso