Skip to content

fix: replace unsafe pickle.load with restricted unpickler#249

Open
mvanhorn wants to merge 1 commit intoshiyu-coder:masterfrom
mvanhorn:fix/216-unsafe-pickle-deserialization
Open

fix: replace unsafe pickle.load with restricted unpickler#249
mvanhorn wants to merge 1 commit intoshiyu-coder:masterfrom
mvanhorn:fix/216-unsafe-pickle-deserialization

Conversation

@mvanhorn
Copy link
Copy Markdown

Summary

Adds a RestrictedUnpickler to prevent arbitrary code execution when loading .pkl data files. Python's pickle.load can execute arbitrary code if a malicious .pkl file is supplied (RCE risk).

Changes

  • finetune/dataset.py: Added RestrictedUnpickler class with an allowlist of safe types (numpy, pandas, Python builtins). Replaced pickle.load(f) with restricted_load(f) in QlibDataset.__init__.
  • finetune/qlib_test.py: Replaced both pickle.load calls with restricted_load imported from dataset.py.

The allowlist covers the types typically found in Kronos training data (numpy arrays, pandas DataFrames, standard Python types). If users have .pkl files with custom types, the error message tells them which module.name to add to _SAFE_MODULES.

Testing

Syntax verified. Existing data formats are covered by the allowlist. The pickle.dump call (line 349 in qlib_test.py) is left unchanged since writing pickles is safe.

Fixes #216

This contribution was developed with AI assistance (Claude Code).

pickle.load can execute arbitrary code if a malicious .pkl file is
supplied. This adds a RestrictedUnpickler that only allows known-safe
types (numpy arrays, pandas DataFrames, Python builtins) and uses it
in QlibDataset and qlib_test.py.

Users who encounter UnpicklingError for legitimate types can add them
to _SAFE_MODULES in dataset.py.

Fixes shiyu-coder#216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security Issue: Unsafe deserialization in QlibDataset

1 participant