Add DiskProvisionedIops/ThroughputMibps pipeline options for the Java and GO SDKs and update Go client libraries#38349
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enables users to configure provisioned IOPS and throughput for worker VM disks in Google Cloud Dataflow pipelines. By introducing these pipeline options, users can now explicitly control disk performance, which is essential for workloads with specific I/O requirements. The changes include updates to the Java and Go SDKs, along with necessary dependency upgrades to ensure compatibility with the updated Dataflow API. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for configuring disk provisioned IOPS and throughput for Dataflow workers in both the Java and Go SDKs. Key changes include the addition of new pipeline options, updates to the Dataflow API client version, and modifications to the job translation logic to propagate these settings to worker pools. Corresponding unit tests were added to ensure the options are correctly handled. Feedback was provided regarding an incorrect issue reference in the changelog documentation.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #38349 +/- ##
============================================
- Coverage 57.34% 48.73% -8.61%
- Complexity 5120 13021 +7901
============================================
Files 1393 2032 +639
Lines 197972 183250 -14722
Branches 4826 10614 +5788
============================================
- Hits 113522 89306 -24216
- Misses 80614 88257 +7643
- Partials 3836 5687 +1851
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
Assigning reviewers: R: @jrmccluskey for label go. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
assign set of reviewers |
|
Reviewers are already assigned to this PR: @jrmccluskey @damccorm |
This pull request introduces two new pipeline options for the Google Cloud Dataflow runner for the Java and Go SDKs. These options allow users to specify provisioned performance for worker VM disks:
disk_provisioned_iops: Sets the provisioned IOPS for the disk. If unspecified, the service chooses a default
disk_provisioned_throughput_mibps: Sets the provisioned throughput in MiB/s for the disk.
Tests have been added/updated to verify that these options are correctly parsed and translated.
More context:
we need to add these pipeline options before submitting this cl: https://critique.corp.google.com/cl/858930428
Issue: #37374
Additionally, the Go Google API client dependencies were updated to their latest versions to allow the Go SDK to recognize the newly introduced DiskProvisionedIops and DiskProvisionedThroughputMibps fields in the Dataflow API. Several other Go client libraries and dependencies in sdks/go.mod have also been indirectly updated to their latest versions as well.
Note: The pipeline options for the python SDK will be introduced in a separate pull request
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.