This repo creates a hosted Rancher plus one or more tenant Ranchers on imported downstream K3s clusters, with a simpler config flow and optional auto-version resolution.
This system creates a Host Rancher that manages multiple Tenant Ranchers as imported clusters. Each instance can run different versions of K3S and Rancher, allowing you to test version compatibility and upgrade scenarios.
- Host Rancher (Index 0): The primary Rancher instance that manages tenant clusters
- Tenant Ranchers (Index 1+): Secondary Rancher instances running as imported clusters
- Phase 1: Install K3S on all instances and import tenants as plain clusters
- Phase 2: Install Rancher on each imported tenant cluster using cluster-specific Helm commands
- Dedicated S3 bucket for Terraform state storage
- Existing AWS VPC, subnets, AMI, and security group values
- A repo-root
tool-config.yml - Local
kubectl,helm, andterraform
Copy one of these examples to tool-config.yml:
tool-config.yml replaces the old config.yml flow. The test code still falls back to config.yml for compatibility, but the new path is tool-config.yml.
These values now come from environment variables instead of config:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYDOCKERHUB_USERNAMEDOCKERHUB_PASSWORD
DOCKERHUB_USERNAME and DOCKERHUB_PASSWORD are optional. If both are unset, the repo skips registries.yaml generation and K3s will pull anonymously.
The easiest local setup is ~/.zprofile:
export AWS_ACCESS_KEY_ID="your-aws-access-key"
export AWS_SECRET_ACCESS_KEY="your-aws-secret-key"
export DOCKERHUB_USERNAME="your-dockerhub-username"
export DOCKERHUB_PASSWORD="your-dockerhub-password"total_rancher_instances: total host + tenant instances,2-4rancher.mode:manualorautos3.*: backend bucket/regiontf_vars.*: non-secret AWS/Terraform inputs
Index mapping is still:
Index 0: Host Rancher + Host K3S
Index 1: Tenant 1 Rancher + Tenant 1 K3S
Index 2: Tenant 2 Rancher + Tenant 2 K3S
Index 3: Tenant 3 Rancher + Tenant 3 K3S
Use rancher.mode: auto when you want to give Rancher versions and let the tool resolve the rest.
In auto mode the test will:
- Resolve the right Rancher chart source and chart version.
- Resolve image overrides for head/alpha/rc builds when needed.
- Read the SUSE support matrix for the chosen Rancher compatibility baseline.
- Pick the highest supported K3s minor and latest patch in that line.
- Download and hash the exact K3s installer and airgap bundle URLs.
- Generate
rancher.helm_commandsandk3s.*values in memory. - Print the plan and, on macOS, show a GoLand-friendly native confirmation dialog unless
rancher.auto_approve: true.
Auto mode accepts:
rancher.versionfor a single instancerancher.versionsfor multiple instancesrancher.distrowithauto,community, orprimerancher.bootstrap_passwordrancher.auto_approve
Use rancher.mode: manual when you want full control over Helm commands and K3s versions.
Manual mode accepts:
rancher.helm_commandsk3s.versionork3s.versionsk3s.install_script_sha256ork3s.install_script_sha256sk3s.airgap_image_sha256ork3s.airgap_image_sha256sk3s.preload_images
The test runner now uses AWS Systems Manager Run Command instead of SSH.
- No local IP whitelist check before Terraform runs
- No SSH private key required for remote commands
- EC2 instances get an SSM instance profile and bootstrap the SSM agent during provisioning
The node bootstrap now prepares K3s config files before installation:
/etc/rancher/k3s/config.yamlfor the shared datastore and TLS SANs/etc/rancher/k3s/registries.yamlwhen Docker Hub credentials are present/var/lib/rancher/k3s/agent/images/with the K3s image tarball when preloading is enabled
This “preload” path is K3s’s documented airgap image import mechanism. In this repo it is being used as an online optimization to reduce registry pulls and avoid Docker Hub throttling during bootstrap.
The K3s bootstrap path verifies downloaded upstream artifacts before using them:
- The repo does not use
curl | shfor the K3s installer. - The installer is downloaded from the exact version tag.
- The installer must match the pinned SHA256 before it runs.
- The airgap image bundle must match the pinned SHA256 before it is moved into the K3s image import directory.
Every file downloaded over the network as part of the K3s install path is verified against a known-good SHA256 before it is used or executed. This protects against compromised upstream releases (e.g. a tampered GitHub release asset) reaching your nodes.
| Pattern | Location | Status |
|---|---|---|
curl | sh |
— | Eliminated — no instances exist in this repo |
| K3s installer curl | tools.go:675 | Hardened — SHA256 validated twice (Go preflight + bash on node) |
| K3s airgap images curl | tools.go:641 | Hardened — SHA256 validated on remote node before install |
K3s installer script (tools.go)
The installer is validated twice — once in Go before provisioning starts, and once in bash on the remote node:
- Before provisioning,
validatePinnedK3SArtifactsdownloadsinstall.shfor the pinned K3s version and compares its SHA256 against the pinned value. If the hash does not match, provisioning is blocked entirely. - On each node, the install command downloads the script to a temp file, runs
sha256sum -cagainst the same hash, and refuses to execute if validation fails — with a clearSECURITY ERRORmessage in stderr.
Where the expected hash comes from depends on mode:
manualmode — you supplyk3s.install_script_sha256(single instance) ork3s.install_script_sha256s(multiple instances) explicitly in your config. The tool checks your pinned value before any node is touched.automode — the tool fetches the versionedinstall.shat plan time, computes its SHA256, and stores it in the resolved plan. That computed hash is then used for both the Go preflight check and the per-node bash validation.
K3s airgap image bundle (tools.go)
When k3s.preload_images: true is set, the airgap image tarball is also validated before it is moved into /var/lib/rancher/k3s/agent/images/:
- The tarball (
k3s-airgap-images-amd64.tar.zst) is downloaded to/tmp. sha256sum -cis run against the pinned hash (fromk3s.airgap_image_sha256sin manual mode, or the resolved plan in auto mode).- If validation fails, the tarball is discarded via the
trapcleanup and the script exits with aSECURITY ERROR— the corrupted file never reaches the K3s image directory.
Rancher API calls
Calls to the Rancher API from the test runner (token creation, import manifest lookup, server-url update, stability probes) currently skip TLS verification when talking back to the Rancher URL. This is intentional for now: the ALB → instance re-encrypt path and Rancher's own bootstrap certs have historically caused flakes during early startup, and the calls are authenticated with a freshly-minted admin bearer token. Tightening this to full TLS verification is a known follow-up.
kubectl apply inside the generated import.sh uses --insecure-skip-tls-verify to hit the imported cluster's K3s API server on its public node IP (self-signed cert). The manifest itself is fetched over HTTPS from the ACM-signed Rancher URL — the flag only applies to the kubeconfig target.
Update the K3s checksums whenever you add or change an entry in k3s.versions.
For each K3s version, download the exact installer script from the version tag and compute its SHA256:
export K3S_VERSION="v1.33.7+k3s3"
curl -fsSL "https://raw.githubusercontent.com/k3s-io/k3s/${K3S_VERSION/+/%2B}/install.sh" -o /tmp/k3s-install.sh
shasum -a 256 /tmp/k3s-install.shThen download the matching airgap image bundle and compute its SHA256:
export K3S_VERSION="v1.33.7+k3s3"
curl -fsSL "https://github.com/k3s-io/k3s/releases/download/${K3S_VERSION/+/%2B}/k3s-airgap-images-amd64.tar.zst" -o /tmp/k3s-airgap-images-amd64.tar.zst
shasum -a 256 /tmp/k3s-airgap-images-amd64.tar.zstCopy only the hash on the left into tool-config.yml:
k3s:
install_script_sha256s:
v1.33.7+k3s3: "9ca7930c31179d83bc13de20078fd8ad3e1ee00875b31f39a7e524ca4ef7d9de"
airgap_image_sha256s:
v1.33.7+k3s3: "b0d7062008fa7fcad9ad7c6b60f74ae1c561927dbb5a4105433f5afbd091361b"Execute the hosted/tenant setup with 60 minute timeout:
go test -v -run TestHosted -timeout 60mRemove all resources when finished with 60 minute timeout:
go test -v -run TestCleanup -timeout 60m- Terraform Apply: Provisions AWS infrastructure (EC2, RDS, Route53)
- Host K3S Installation: Installs K3S on host using version from index 0
- Host Rancher Installation: Installs Rancher on host using Helm command from index 0
- Wait for Stability: Ensures host Rancher is responding (accepts 200/302/401/403/404 status codes)
- Bootstrap & Configure: Creates admin token and configures server-url setting
- Tenant K3S Installation: Installs K3S on each tenant using respective version
- Cluster Import: Imports each tenant as a plain cluster into host Rancher
- Wait for Active: Ensures each imported cluster becomes Active in host Rancher
- Tenant Rancher Installation: Installs Rancher on each Active tenant cluster
- Final Verification: Confirms all tenant Ranchers are stable and accessible
- Different K3S Versions: Each instance can run a different K3S version
- Different Rancher Versions: Each instance can run a different Rancher version
- Upgrade Path Testing: Test compatibility between versions
- Array Count Validation: Ensures K3S versions and Helm commands match instance count
- S3 State Checking: Prevents conflicts with existing deployments
- Progressive Installation: Waits for each phase to complete before proceeding
- 2-4 Instances Supported: Minimum 1 host + 1 tenant, maximum 1 host + 3 tenants
- Custom Helm Commands: Each Rancher instance can have unique installation parameters
- Repository Flexibility: Mix alpha, latest, and stable chart repositories
The aws_rds_password must meet AWS criteria:
- Minimum 8 printable ASCII characters
- Cannot contain: /, ', ", @ symbols
- One deployment per bucket: Each S3 bucket can only host one active deployment
- Cleanup required: Run
TestCleanupbefore starting a new deployment in the same bucket
The --set hostname=placeholder in Helm commands gets automatically replaced with the actual Route53 hostname during installation.
- Validation Errors: Ensure array counts match
total_rancher_instances - S3 Conflicts: Clean up existing deployments before starting new ones
- RDS Password: Verify password meets AWS requirements
- Timeout Issues: The new status code checking should resolve most stability timeout issues
The system provides detailed logging including:
- Installation progress for each phase
- HTTP status codes during stability checks
- Cluster import and activation status
- Final URLs for all Rancher instances
Upon successful completion, you'll receive URLs for all instances:
Host Rancher https://host-rancher.your-domain.com
Tenant Rancher 1 https://tenant1-rancher.your-domain.com
Tenant Rancher 2 https://tenant2-rancher.your-domain.com
Tenant Rancher 3 https://tenant3-rancher.your-domain.comEach tenant will also appear as an imported cluster in the host Rancher UI.