Skip to content

fix(docker): add fail-fast docker socket validation, improve error detection, add tests, and skip integration tests when daemon unavailable#15389

Merged
loicmathieu merged 2 commits intokestra-io:developfrom
shivamwayal37:fix-docker-socket-error-message
Apr 14, 2026
Merged

fix(docker): add fail-fast docker socket validation, improve error detection, add tests, and skip integration tests when daemon unavailable#15389
loicmathieu merged 2 commits intokestra-io:developfrom
shivamwayal37:fix-docker-socket-error-message

Conversation

@shivamwayal37
Copy link
Copy Markdown
Contributor

✨ Description

This PR improves Docker error handling by introducing a fail-fast validation for Unix socket-based Docker hosts.

  • Adds pre-validation to detect missing or inaccessible Docker Unix sockets before execution
  • Improves detection of Docker socket-related errors by inspecting nested exception causes
  • Provides clearer and more actionable error messages when Docker is unavailable
  • Adds dedicated unit tests (DockerErrorHandlingTest) to cover these scenarios
  • Updates Docker integration tests to skip execution when the Docker daemon/socket is not available

This ensures a clear separation between environment/setup failures and container execution failures, improving reliability and debuggability.

🔗 Related Issue

Closes #15161

🛠️ Backend Checklist

  • Code compiles successfully and passes all checks
  • All unit and integration tests pass

📝 Additional Notes

Add any extra context or details reviewers should be aware of.

  • IllegalStateException is intentionally used for pre-execution environment validation failures instead of TaskException, to avoid mixing distinct failure modes.
  • Docker integration tests are conditionally skipped when the Docker daemon is not accessible, preventing false negatives in environments without Docker.

@github-project-automation github-project-automation Bot moved this to To review in Pull Requests Apr 6, 2026
@MilosPaunovic MilosPaunovic added area/backend Needs backend code changes kind/external Pull requests raised by community contributors labels Apr 6, 2026
@shivamwayal37
Copy link
Copy Markdown
Contributor Author

It looks like the CI failure is unrelated to this PR:

  • Failure occurs in BlueprintControllerTest
  • Error: Address already in use (port 28181)
  • This PR only modifies Docker runner logic

This seems like a flaky test or port conflict in CI.

Would you prefer:

  1. I re-run the workflow?
  2. Or open a separate PR to make the test use a random port?

Happy to address it separately if needed.

Copy link
Copy Markdown
Member

@loicmathieu loicmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly complicated.
What I don't unrestand is that we had a code block to handle specifically this kind of exception but it seems it didn't work, maybe fix that code block instead of removing it and replacing it?

@shivamwayal37
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback!

You're right — the current approach introduces duplication between the pre-validation and the exception handling block.

My intention was to:

  • Fail fast for missing Unix sockets before execution
  • Improve detection of Docker socket-related errors from nested exceptions

But I see that this overlaps with the existing logic.

I'll refactor this to:

  • Reuse and improve the existing error handling block instead of duplicating it
  • Remove redundant checks between pre-validation and catch block

Will push an updated version shortly.

…tection, add tests, and skip integration tests when daemon unavailable
@shivamwayal37 shivamwayal37 force-pushed the fix-docker-socket-error-message branch from eee6674 to 4581cce Compare April 13, 2026 15:24
@shivamwayal37
Copy link
Copy Markdown
Contributor Author

Thanks for the review @loicmathieu!

I’ve refactored the implementation to remove duplication and simplify the flow:

  • Reused and improved the existing error handling instead of replacing it
  • Removed redundant checks between pre-validation and the catch block
  • Kept fail-fast validation only for early Unix socket detection

This makes the behavior more consistent and easier to follow.

Regarding the CI failure: it occurs in jdbc-postgres (PostgresServiceLivenessCoordinatorTest) and appears unrelated to this PR. The script module (where these changes were made) is passing all tests successfully. This looks like a flaky or environment-related issue, but I’m happy to re-run the workflow or investigate further if needed.

Copy link
Copy Markdown
Member

@loicmathieu loicmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@loicmathieu loicmathieu merged commit 5aa59c3 into kestra-io:develop Apr 14, 2026
8 of 9 checks passed
@github-project-automation github-project-automation Bot moved this from To review to Done in Pull Requests Apr 14, 2026
@loicmathieu
Copy link
Copy Markdown
Member

Backported to 1.3 and 1.0

@shivamwayal37
Copy link
Copy Markdown
Contributor Author

Thanks @loicmathieu and @MilosPaunovic for the review and quick merge!

Glad the refactor aligns well. Appreciate the backport as well 🙌

@shivamwayal37 shivamwayal37 deleted the fix-docker-socket-error-message branch April 16, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/backend Needs backend code changes kind/external Pull requests raised by community contributors

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Unclear error message on missing docker socket

3 participants