Skip to content

GH-47435: [Python][Parquet] Add direct key encryption/decryption API#49667

Open
smaheshwar-pltr wants to merge 5 commits intoapache:mainfrom
smaheshwar-pltr:parquet-direct-key-encryption
Open

GH-47435: [Python][Parquet] Add direct key encryption/decryption API#49667
smaheshwar-pltr wants to merge 5 commits intoapache:mainfrom
smaheshwar-pltr:parquet-direct-key-encryption

Conversation

@smaheshwar-pltr
Copy link
Copy Markdown

@smaheshwar-pltr smaheshwar-pltr commented Apr 6, 2026

Rationale for this change

See #47435.

What changes are included in this PR?

Adds direct encryption / decryption Python API

Are these changes tested?

Yes, see PR.

Are there any user-facing changes?

Yes, new Python bindings.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@smaheshwar-pltr smaheshwar-pltr changed the title Support Parquet direct encryption [Python][Parquet] Add direct key encryption/decryption API Apr 7, 2026
@smaheshwar-pltr
Copy link
Copy Markdown
Author

cc @ggershinsky, this is a requirement for PyIceberg encryption support, see apache/iceberg-python#3221

@ggershinsky
Copy link
Copy Markdown
Contributor

sgtm, #47435 (comment)

@smaheshwar-pltr smaheshwar-pltr changed the title [Python][Parquet] Add direct key encryption/decryption API GH-47435: [Python][Parquet] Add direct key encryption/decryption API Apr 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

⚠️ GitHub issue #47435 has been automatically assigned in GitHub to PR creator.

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

⚠️ GitHub issue #47435 has been automatically assigned in GitHub to PR creator.

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
Comment on lines +726 to +733
This bypasses the KMS-based :class:`CryptoFactory` API and directly
constructs decryption properties from a plaintext key. This is useful
when the caller manages key wrapping externally (e.g. via an
application-level envelope encryption scheme).

For most use cases, prefer the higher-level :class:`CryptoFactory`
with :class:`DecryptionConfiguration`, which handles envelope
encryption and key rotation automatically.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #47435 (comment), adding this documentation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's true that the CryptoFactory handles key rotation automatically, that part of the comment should probably be removed. It does allow key rotation if you use external key material though.

@smaheshwar-pltr smaheshwar-pltr marked this pull request as ready for review April 8, 2026 19:06
@smaheshwar-pltr
Copy link
Copy Markdown
Author

Not sure who to ping here for review, @AlenkaF @raulcd @rok maybe could I please have some eyes on this?

Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this @smaheshwar-pltr! I've left a few suggestions

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
Comment on lines +726 to +733
This bypasses the KMS-based :class:`CryptoFactory` API and directly
constructs decryption properties from a plaintext key. This is useful
when the caller manages key wrapping externally (e.g. via an
application-level envelope encryption scheme).

For most use cases, prefer the higher-level :class:`CryptoFactory`
with :class:`DecryptionConfiguration`, which handles envelope
encryption and key rotation automatically.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's true that the CryptoFactory handles key rotation automatically, that part of the comment should probably be removed. It does allow key rotation if you use external key material though.

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
Comment on lines +738 to +739
The decryption key for the file footer (and all columns if
uniform encryption was used). Must be 16, 24, or 32 bytes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this only supports uniform encryption at the moment, so this comment could be a little confusing. Maybe there should be a note about this added here, eg. "Note that currently only uniform encryption is supported with this method"

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated

For most use cases, prefer the higher-level :class:`CryptoFactory`
with :class:`EncryptionConfiguration`, which handles envelope
encryption, key rotation, and unique-per-file data keys
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
encryption, key rotation, and unique-per-file data keys
encryption and unique-per-file data keys

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
footer_key : bytes
The encryption key for the file footer (and all columns unless
per-column keys are specified). Must be 16, 24, or 32 bytes
for AES-128, AES-192, or AES-256 respectively.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, we should add a note somewhere about per-column keys not being supported yet.

@github-actions github-actions bot added awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 17, 2026
@github-actions github-actions bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Apr 17, 2026
@smaheshwar-pltr
Copy link
Copy Markdown
Author

@adamreeve, thank you for the great review here 🙌 . I've reworked the docs now - they are consistent with my (inexperienced) understanding.

Would appreciate your eyes on this again, please let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants