Following up on #2079 , GitHub provides ARM boxes like ubuntu-24.04-arm and windows-11-arm.
We want to be testing on a wide array of targets, so adding these would be nice. We also need to contend with the fact that our matrix is getting really big. Especially for a project which isn't full of heavily architecture-specific code, and has a flakier-than-we'd-like testsuite.
I think adding Ubuntu & Windows ARM testing raises the priority of some existing issues to solve. These are the items I'd like us to consider as we plan:
- Refactor the way we declare our test matrix so that we can be more selective.
I'd like to see a matrix which tests something like the following suites:
# ranges
Python 3.9-3.14 x Ubuntu Intel
Python 3.9-3.14 x Windows Intel
Python 3.9-3.14 x macOS ARM
# top and bottom
Python 3.9, 3.14 x macOS Intel
Python 3.9, 3.14 x Ubuntu ARM
Python 3.9, 3.14 x Windows ARM
If we explode out the full matrix, there are 12 more jobs in there, and it's questionable whether or not they are high value for pip-tools.
- Make tests more stable (where possible).
This shouldn't be treated as a blocker, since we can't fully solve it in short order.
We still have flaky failures because we have tests which rely on the network and state of pypi. The bigger we make our matrix, the more important stable tests are.
- Make tests faster (where possible).
Again, not a blocker but more important the bigger the matrix. Are we doing network IO to test something which can be tested without network IO? Can we make tests simply faster to run?
The cost of flakiness (item 2) goes down if it's very cheap and easy to rerun a failed job.
@webknjaz , I'm curious what you think of (1), in terms of picking a narrower slice vs testing the true, full matrix. Do you think that's a good idea?
If we had faster and more stable tests, I wouldn't hesitate to expand the matrix. But as things currently are, I'm a bit concerned that we could be making the project harder to maintain for limited gain.
Following up on #2079 , GitHub provides ARM boxes like
ubuntu-24.04-armandwindows-11-arm.We want to be testing on a wide array of targets, so adding these would be nice. We also need to contend with the fact that our matrix is getting really big. Especially for a project which isn't full of heavily architecture-specific code, and has a flakier-than-we'd-like testsuite.
I think adding Ubuntu & Windows ARM testing raises the priority of some existing issues to solve. These are the items I'd like us to consider as we plan:
I'd like to see a matrix which tests something like the following suites:
If we explode out the full matrix, there are 12 more jobs in there, and it's questionable whether or not they are high value for pip-tools.
This shouldn't be treated as a blocker, since we can't fully solve it in short order.
We still have flaky failures because we have tests which rely on the network and state of pypi. The bigger we make our matrix, the more important stable tests are.
Again, not a blocker but more important the bigger the matrix. Are we doing network IO to test something which can be tested without network IO? Can we make tests simply faster to run?
The cost of flakiness (item 2) goes down if it's very cheap and easy to rerun a failed job.
@webknjaz , I'm curious what you think of (1), in terms of picking a narrower slice vs testing the true, full matrix. Do you think that's a good idea?
If we had faster and more stable tests, I wouldn't hesitate to expand the matrix. But as things currently are, I'm a bit concerned that we could be making the project harder to maintain for limited gain.