fix: replace unsafe pickle.load with restricted unpickler#249
Open
mvanhorn wants to merge 1 commit intoshiyu-coder:masterfrom
Open
fix: replace unsafe pickle.load with restricted unpickler#249mvanhorn wants to merge 1 commit intoshiyu-coder:masterfrom
mvanhorn wants to merge 1 commit intoshiyu-coder:masterfrom
Conversation
pickle.load can execute arbitrary code if a malicious .pkl file is supplied. This adds a RestrictedUnpickler that only allows known-safe types (numpy arrays, pandas DataFrames, Python builtins) and uses it in QlibDataset and qlib_test.py. Users who encounter UnpicklingError for legitimate types can add them to _SAFE_MODULES in dataset.py. Fixes shiyu-coder#216
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
RestrictedUnpicklerto prevent arbitrary code execution when loading .pkl data files. Python'spickle.loadcan execute arbitrary code if a malicious .pkl file is supplied (RCE risk).Changes
finetune/dataset.py: AddedRestrictedUnpicklerclass with an allowlist of safe types (numpy, pandas, Python builtins). Replacedpickle.load(f)withrestricted_load(f)inQlibDataset.__init__.finetune/qlib_test.py: Replaced bothpickle.loadcalls withrestricted_loadimported fromdataset.py.The allowlist covers the types typically found in Kronos training data (numpy arrays, pandas DataFrames, standard Python types). If users have .pkl files with custom types, the error message tells them which module.name to add to
_SAFE_MODULES.Testing
Syntax verified. Existing data formats are covered by the allowlist. The
pickle.dumpcall (line 349 in qlib_test.py) is left unchanged since writing pickles is safe.Fixes #216
This contribution was developed with AI assistance (Claude Code).