Skip to content

Commit 5e54a01

Browse files
committed
merge build-guide into README, drop docs/ directory
- Fold docs/build-guide.md into README.md so there's a single entry point for anyone landing on the repo - Drop the 'Why GHCR instead of LFS' section (not interesting to consumers, and the story is already in git history) - Expand the 'Pulling a pre-built image' section: oras pull drops split parts, cat them back with a glob, verify with sha256sum, then boot directly with cloud-hypervisor (example command included). Users land here and need to know how to go from 'I ran oras pull' to 'I have a bootable qcow2' - Keep everything that was in docs/build-guide.md: version requirements, the four quirks (q35 -cdrom, floppy, Secure Boot, bootmgr press-any-key), manual build steps, OOBE explanation, FirstLogonCommands table, post-clone networking
1 parent 5701c7a commit 5e54a01

2 files changed

Lines changed: 314 additions & 440 deletions

File tree

README.md

Lines changed: 314 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2,52 +2,341 @@
22

33
Build automation for Windows 11 25H2 disk images targeting Cloud Hypervisor.
44

5+
Contents:
6+
57
- `autounattend.xml` — unattended Windows setup configuration
6-
- `scripts/verify.ps1` + `scripts/remediate.ps1` — post-install verification loop
7-
- `.github/workflows/build.yml` — headless QEMU/KVM build on GitHub Actions
8-
- `docs/build-guide.md` — manual build guide
8+
- `scripts/verify.ps1` + `scripts/remediate.ps1` — post-install verification / remediation loop
9+
- `.github/workflows/build.yml` — headless QEMU/KVM build on `ubuntu-latest`, publishes to GHCR via ORAS
910

10-
## Distribution
11+
## Pulling a pre-built image
1112

12-
Built images are published as OCI artifacts to GHCR (not Git LFS):
13+
Built images are published as OCI artifacts to GHCR:
1314

1415
```
15-
ghcr.io/cmgs/windows:win11-25h2
16-
ghcr.io/cmgs/windows:win11-25h2-<YYYYMMDD>
16+
ghcr.io/cmgs/windows:win11-25h2 # moving alias, latest good build
17+
ghcr.io/cmgs/windows:win11-25h2-<YYYYMMDD> # dated immutable tag
1718
```
1819

19-
### Pull and reassemble
20+
### 1. Pull
2021

2122
```bash
22-
# Requires oras CLI (https://oras.land)
23+
# Requires oras CLI -- https://oras.land
2324
oras pull ghcr.io/cmgs/windows:win11-25h2
25+
```
26+
27+
This drops the split parts and `SHA256SUMS` into the current directory:
28+
29+
```
30+
windows-11-25h2.qcow2.00.qcow2.part 1.9G
31+
windows-11-25h2.qcow2.01.qcow2.part 1.9G
32+
...
33+
windows-11-25h2.qcow2.07.qcow2.part ~200M
34+
SHA256SUMS
35+
```
36+
37+
### 2. Reassemble into a qcow2
38+
39+
The qcow2 is split into ~1.9 GiB parts so every blob stays under the GHCR per-layer limit. `split` produces chunks in lexicographic order, so a plain `cat` with a glob gives you the original file back byte-for-byte:
40+
41+
```bash
2442
cat windows-11-25h2.qcow2.*.qcow2.part > windows-11-25h2.qcow2
2543
sha256sum -c SHA256SUMS
26-
qemu-img info windows-11-25h2.qcow2
44+
rm windows-11-25h2.qcow2.*.qcow2.part # optional, ~14 GiB of duplicate data
45+
```
46+
47+
The OCI manifest also carries the reassemble command in the `cocoonstack.windows.reassemble` annotation so any tool inspecting the artifact can discover it.
48+
49+
### 3. Boot it
50+
51+
```bash
52+
qemu-img info windows-11-25h2.qcow2 # sanity check
53+
54+
# On Cloud Hypervisor with our patched firmware:
55+
cloud-hypervisor \
56+
--kernel /usr/local/share/cloud-hypervisor/CLOUDHV.fd \
57+
--disk path=windows-11-25h2.qcow2 \
58+
--cpus boot=2 --memory size=4G \
59+
--net tap=,mac=,ip=,mask= \
60+
--serial tty --console off
61+
```
62+
63+
Login is the local admin `cocoon` account set up by `autounattend.xml`. SSH and WinRM are enabled out of the box.
64+
65+
## Building yourself
66+
67+
Two flows share the same automation: **GitHub Actions** (`ubuntu-latest`, free tier, ~2 h, auto-publishes to GHCR) and **local** (any Linux + KVM host).
68+
69+
### Version requirements
70+
71+
| Component | Version | Notes |
72+
|------------------|--------------|------------------------------------------------------------------------------|
73+
| Cloud Hypervisor | **v51+** | Use [cocoonstack/cloud-hypervisor `dev`][ch-fork] for full Windows support |
74+
| Firmware | **patched** | Use [cocoonstack/rust-hypervisor-firmware `dev`][fw-fork] for ACPI shutdown |
75+
| virtio-win | **0.1.285** | Latest stable; 0.1.240 also works on upstream CH without patches |
76+
| QEMU (build) | **≥ 8.x** | Build host only — production runs on Cloud Hypervisor |
77+
| OVMF (build) | **secboot** | `OVMF_CODE_4M.secboot.fd` — Win11 requires Secure Boot, see quirk #3 |
78+
79+
With our [CH fork][ch-fork] and [firmware fork][fw-fork], the known Windows issues on Cloud Hypervisor are resolved:
80+
- v51 BSOD fixed ([#7849][ch-7849], [PR #7936][ch-7936])
81+
- virtio-win 0.1.285 works ([#7925][ch-7925], ctrl_queue + used_len fix)
82+
- ACPI power-button shutdown works ([firmware#422][fw-422], [firmware PR #423][fw-423])
83+
84+
Install patched binaries:
85+
86+
```bash
87+
curl -fsSL -o /usr/local/bin/cloud-hypervisor \
88+
https://github.com/cocoonstack/cloud-hypervisor/releases/download/dev/cloud-hypervisor
89+
chmod +x /usr/local/bin/cloud-hypervisor
90+
91+
curl -fsSL -o /usr/local/share/cloud-hypervisor/CLOUDHV.fd \
92+
https://github.com/cocoonstack/rust-hypervisor-firmware/releases/download/dev/hypervisor-fw
93+
```
94+
95+
[ch-fork]: https://github.com/cocoonstack/cloud-hypervisor/tree/dev
96+
[fw-fork]: https://github.com/cocoonstack/rust-hypervisor-firmware/tree/dev
97+
[ch-7849]: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/7849
98+
[ch-7925]: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/7925
99+
[ch-7936]: https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7936
100+
[fw-422]: https://github.com/cloud-hypervisor/rust-hypervisor-firmware/issues/422
101+
[fw-423]: https://github.com/cloud-hypervisor/rust-hypervisor-firmware/pull/423
102+
103+
### Building via GitHub Actions
104+
105+
```bash
106+
gh workflow run build.yml --repo CMGS/windows -f version_tag=win11-25h2
107+
```
108+
109+
Requires one repository secret:
110+
111+
- `WINDOWS_ISO_URL` — signed download URL for the Windows 11 25H2 ISO. Microsoft licensing prohibits bundling the ISO in the repo or any artifact, so fetch it at build time.
112+
113+
The workflow:
114+
115+
1. Frees ~30 GiB of preinstalled SDKs from the runner (default ubuntu-latest has ~14 GiB free, not enough for `windows.iso` + `virtio-win.iso` + growing qcow2)
116+
2. Boots QEMU with Secure Boot OVMF + swtpm TPM 2.0
117+
3. Injects Enter keys via the QEMU monitor to defeat the "Press any key to boot from CD" prompt
118+
4. Runs unattended install from `autounattend.xml` (delivered as a third CD-ROM; see quirk #2)
119+
5. Polls SSH for `C:\install.success` marker
120+
6. Runs `verify.ps1`, reboots, re-verifies, and applies `remediate.ps1` on failure (up to 3 attempts)
121+
7. Shuts the VM down cleanly, compresses the qcow2, splits it, and pushes to GHCR via ORAS
122+
123+
### Building locally
124+
125+
#### Prerequisites
126+
127+
```bash
128+
sudo apt-get install -y \
129+
qemu-system-x86 qemu-utils \
130+
ovmf swtpm mtools genisoimage \
131+
openssh-client sshpass netcat-openbsd
132+
```
133+
134+
Plus a Windows 11 25H2 ISO and [virtio-win-0.1.285.iso](https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/archive-virtio/virtio-win-0.1.285-1/).
135+
136+
#### Quirks worth knowing upfront
137+
138+
Understanding these four points avoids hours of debugging. All four are encoded into `build.yml`.
139+
140+
**1. q35 `-cdrom` shorthand puts the Windows ISO where OVMF cannot boot from it.** Both `-cdrom windows.iso` and a second `-drive ...,media=cdrom` land on q35's AHCI controller in positions where OVMF's BdsDxe reports `Not Found` for Boot0001. You must attach each CD-ROM explicitly on a dedicated SATA port, and each `ide.N` bus only supports 1 unit:
141+
142+
```
143+
-drive id=cd0,if=none,file=windows.iso,media=cdrom,readonly=on
144+
-device ide-cd,drive=cd0,bus=ide.0,bootindex=0
145+
-drive id=cd1,if=none,file=virtio-win.iso,media=cdrom,readonly=on
146+
-device ide-cd,drive=cd1,bus=ide.1
27147
```
28148

29-
The qcow2 is split into ~1.9 GiB parts so every blob fits comfortably inside the GHCR per-layer limit. The reassemble command is also stored in the manifest annotation `cocoonstack.windows.reassemble`.
149+
**2. q35 has no working floppy for Windows PE.** The classic "deliver autounattend.xml on a FAT floppy" trick fails silently: the `-drive if=floppy` device is visible at the QEMU level but Windows PE's driver stack does not enumerate it, so the unattend file is never read, Setup falls to its GUI, and the disk stays at 196 K forever while the (headless) screen waits for a click. Fix: pack the XML into a tiny ISO and attach it as a third CD-ROM. Windows Setup searches every CD-ROM for `autounattend.xml` at the root:
150+
151+
```bash
152+
genisoimage -o autounattend.iso -J -r autounattend.xml
153+
```
154+
155+
```
156+
-drive id=cd2,if=none,file=autounattend.iso,media=cdrom,readonly=on
157+
-device ide-cd,drive=cd2,bus=ide.2
158+
```
159+
160+
**3. Windows 11 25H2 refuses to install without Secure Boot.** If you use `OVMF_CODE_4M.fd` (non-secboot) to dodge CD-ROM boot issues, the installer reads autounattend.xml, starts Setup, and then immediately aborts with *"This PC doesn't currently meet Windows 11 system requirements — The PC must support Secure Boot"*. You **must** use the Secure Boot firmware and enable SMM:
161+
162+
```
163+
-machine q35,accel=kvm,smm=on
164+
-global driver=cfi.pflash01,property=secure,value=on
165+
-drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE_4M.secboot.fd
166+
-drive if=pflash,format=raw,file=OVMF_VARS.fd
167+
```
168+
169+
The TPM 2.0 swtpm socket is also non-negotiable — Win11 checks for both.
170+
171+
**4. Windows Boot Manager shows "Press any key to boot from CD" even on UEFI / first install.** The prompt is not a firmware prompt — it's inside `bootmgfw.efi` (Windows Boot Manager) that OVMF loads from the ISO. It appears on **every** CD boot, first install and reboots alike. Microsoft's design is that after install you normally *don't* press a key, so bootmgr times out in ~5 seconds, OVMF marks Boot0001 `failed to start: Time out`, and you fall through to the installed disk. The prompt still renders on first boot, and there's no way to auto-accept it from the ISO side.
172+
173+
In a headless build we have no keyboard, so the serial log shows:
174+
175+
```
176+
BdsDxe: loading Boot0001 "UEFI QEMU DVD-ROM QM00001" from ...
177+
BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001" from ...
178+
BdsDxe: failed to start Boot0001 ...: Time out
179+
```
180+
181+
`Time out` is Windows bootmgr exiting after nobody pressed a key — not OVMF giving up on the device. Fix: attach a QEMU monitor, then spray Enter keys into it during the window between OVMF loading bootmgfw.efi and the bootmgr timeout:
182+
183+
```bash
184+
qemu-system-x86_64 ... \
185+
-monitor tcp:127.0.0.1:4444,server,nowait \
186+
-daemonize -pidfile qemu.pid
187+
188+
for delay in 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1; do
189+
sleep $delay
190+
echo 'sendkey ret' | nc -w 1 -q 1 127.0.0.1 4444
191+
done
192+
```
193+
194+
Fifteen presses from +2 s through +17 s cover cold QEMU start + OVMF boot time.
195+
196+
#### Build steps
197+
198+
```bash
199+
# 1. Disk image
200+
qemu-img create -f qcow2 windows-11-25h2.qcow2 40G
201+
202+
# 2. Writable OVMF vars
203+
cp /usr/share/OVMF/OVMF_VARS_4M.fd OVMF_VARS.fd
204+
205+
# 3. TPM emulator
206+
mkdir -p /tmp/mytpm
207+
swtpm socket --tpmstate dir=/tmp/mytpm \
208+
--ctrl type=unixio,path=/tmp/swtpm-sock \
209+
--tpm2 --log level=5 &
210+
211+
# 4. autounattend as ISO (see quirk #2)
212+
genisoimage -o autounattend.iso -J -r autounattend.xml
213+
214+
# 5. Launch QEMU
215+
qemu-system-x86_64 \
216+
-machine q35,accel=kvm,smm=on \
217+
-cpu host,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time \
218+
-m 8G -smp 4 \
219+
-global driver=cfi.pflash01,property=secure,value=on \
220+
-drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE_4M.secboot.fd \
221+
-drive if=pflash,format=raw,file=OVMF_VARS.fd \
222+
-drive id=cd0,if=none,file=windows.iso,media=cdrom,readonly=on \
223+
-device ide-cd,drive=cd0,bus=ide.0,bootindex=0 \
224+
-drive id=cd1,if=none,file=virtio-win-0.1.285.iso,media=cdrom,readonly=on \
225+
-device ide-cd,drive=cd1,bus=ide.1 \
226+
-drive id=cd2,if=none,file=autounattend.iso,media=cdrom,readonly=on \
227+
-device ide-cd,drive=cd2,bus=ide.2 \
228+
-drive if=none,id=root,file=windows-11-25h2.qcow2,format=qcow2 \
229+
-device virtio-blk-pci,drive=root,disable-legacy=on \
230+
-device virtio-net-pci,netdev=mynet0,disable-legacy=on \
231+
-netdev user,id=mynet0,hostfwd=tcp::2222-:22 \
232+
-chardev socket,id=chrtpm,path=/tmp/swtpm-sock \
233+
-tpmdev emulator,id=tpm0,chardev=chrtpm \
234+
-device tpm-tis,tpmdev=tpm0 \
235+
-display none \
236+
-serial file:serial.log \
237+
-monitor tcp:127.0.0.1:4444,server,nowait \
238+
-daemonize -pidfile qemu.pid
239+
240+
# 6. Defeat "Press any key" (quirk #4)
241+
for delay in 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1; do
242+
sleep $delay
243+
echo 'sendkey ret' | nc -w 1 -q 1 127.0.0.1 4444
244+
done
245+
```
246+
247+
Snapshot the screen anytime with `screendump`:
248+
249+
```bash
250+
echo 'screendump /tmp/screen.ppm' | nc -w 1 -q 1 127.0.0.1 4444
251+
convert /tmp/screen.ppm /tmp/screen.png # imagemagick
252+
```
253+
254+
The installer takes ~30 minutes to reach OOBE and another ~20-30 minutes for OOBE + FirstLogonCommands. Expect disk growth from 196 K → 7-8 GiB → 15-17 GiB.
255+
256+
#### Wait for install.success, verify, shut down
257+
258+
```bash
259+
# Wait for the marker
260+
while true; do
261+
sleep 60
262+
sshpass -p 'C@c#on160' ssh -o StrictHostKeyChecking=no \
263+
-o UserKnownHostsFile=/dev/null -p 2222 cocoon@localhost \
264+
'if exist C:\install.success echo READY' 2>/dev/null | grep -q READY && break
265+
echo "$(date) still waiting, disk=$(du -sh windows-11-25h2.qcow2 | cut -f1)"
266+
done
267+
268+
# Upload and run verify / remediate
269+
SSH_OPTS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
270+
sshpass -p 'C@c#on160' ssh $SSH_OPTS -p 2222 cocoon@localhost "mkdir C:\scripts"
271+
sshpass -p 'C@c#on160' scp $SSH_OPTS -P 2222 scripts/verify.ps1 scripts/remediate.ps1 \
272+
cocoon@localhost:"C:\scripts\\"
273+
sshpass -p 'C@c#on160' ssh $SSH_OPTS -p 2222 cocoon@localhost \
274+
"powershell -ExecutionPolicy Bypass -File C:\scripts\verify.ps1"
275+
276+
# Shut down and compress
277+
sshpass -p 'C@c#on160' ssh $SSH_OPTS -p 2222 cocoon@localhost "shutdown /s /t 10"
278+
# wait for QEMU to exit
279+
qemu-img convert -O qcow2 -c windows-11-25h2.qcow2 windows-11-25h2.qcow2.tmp
280+
mv windows-11-25h2.qcow2.tmp windows-11-25h2.qcow2
281+
```
282+
283+
Typical sizes: ~17 GiB uncompressed → ~14 GiB after `qemu-img convert -c`.
284+
285+
## autounattend.xml explained
286+
287+
The included [`autounattend.xml`](autounattend.xml) drives the install across three passes.
288+
289+
### windowsPE pass
290+
291+
- **Keyboard**: US (`InputLocale=0409:00000409`). Nothing else in `Microsoft-Windows-International-Core-WinPE` is set — all other locale fields inherit from the image default, so the same autounattend works for English International, English (US), or any other edition.
292+
- **VirtIO driver injection**: auto-loads drivers from D: and E: (dual drive letter handles varying CD-ROM assignment). `viostor` (disk), `NetKVM` (network), `Balloon` (memory). Both `Win11/amd64/{driver}` (attestation layout) and `{driver}/w11/amd64` (standard) paths are searched.
293+
- **Disk partitioning**: wipes Disk 0, creates EFI (100 MB) + MSR (16 MB) + Windows (remaining, NTFS, C:).
294+
- **Image**: `ImageIndex=6` (Windows 11 Pro).
295+
- **Product key**: `VK7JG-NPHTM-C97JM-9MPGT-3V66T` (generic install key, not activation).
30296

31-
## Building
297+
### specialize pass
32298

33-
Trigger the **Build Windows qcow2** workflow manually (`workflow_dispatch`). Requires one repository secret:
299+
- **BypassNRO**: registry write to skip Win11 mandatory network + Microsoft account during OOBE.
300+
- **ComputerName**: `COCOON-VM` (also re-applied in FirstLogonCommands via `Rename-Computer` because 25H2 sometimes drops this).
301+
- **TimeZone**: Pacific Standard Time.
302+
- **Keyboard**: US (same `InputLocale` only).
34303

35-
- `WINDOWS_ISO_URL` — signed download URL for the Windows 11 25H2 ISO (Microsoft licensing prohibits redistribution, so this cannot live in the repo).
304+
### oobeSystem pass
36305

37-
Runs on `ubuntu-latest` with KVM. Total runtime ~2h (installer ~1h, verify/reboot/push ~1h). The workflow:
306+
- **International-Core**: `InputLocale=0409:00000409` only. The component must be present here for Windows 11 25H2 OOBE to skip the country / keyboard selection screens, but we deliberately do not pin `SystemLocale`/`UILanguage`/`UserLocale` so the image default wins.
307+
- **OOBE**: hides EULA, online account, wireless setup.
308+
- **User account**: local admin `cocoon` with auto-logon (password base64-encoded in XML).
309+
- **FirstLogonCommands**: 26 commands.
38310

39-
1. Boots QEMU with Secure Boot OVMF + swtpm TPM 2.0
40-
2. Injects Enter keys via the QEMU monitor to defeat the "Press any key to boot from CD" prompt
41-
3. Runs unattended install from `autounattend.xml` (delivered as a third CD-ROM since q35 floppy is not visible to Windows PE)
42-
4. Polls SSH for `C:\install.success` marker
43-
5. Runs `verify.ps1`, reboots, re-verifies, and applies `remediate.ps1` until all checks pass (or 3 attempts)
44-
6. Shuts the VM down cleanly, compresses the qcow2, splits it, and pushes to GHCR via ORAS
311+
| Order | Action | Notes |
312+
|--------|------------------------------|-------|
313+
| 1-2 | **RDP** | `fDenyTSConnections=0` + `Enable-NetFirewallRule` |
314+
| 3-4 | **SSH** | `Add-WindowsCapability OpenSSH.Server`, auto-start, firewall rule |
315+
| 5 | **ICMP** | Allow ping |
316+
| 6 | **Firewall** | Disable all profiles (dev/test environment) |
317+
| 7 | **Hibernate** | `powercfg /h off` |
318+
| 8-10 | **SAC / EMS** | `bcdedit /emssettings emsport:1 emsbaudrate:115200`, `/ems on`, `/bootems on` |
319+
| 11 | **TermService** | Set to auto-start |
320+
| 12 | **EMS-SAC Tools** | `Add-WindowsCapability Windows.Desktop.EMS-SAC.Tools~~~~0.0.1.0` — wrapped in `Start-Job` + `Wait-Job -Timeout 1200` so a hung FoD download from Windows Update cannot block the rest of the sequence indefinitely |
321+
| 13 | **Network profile** | Set to Private (required before WinRM AllowUnencrypted) |
322+
| 14-17 | **WinRM** | Enable PS Remoting, AllowUnencrypted, Basic auth, firewall on 5985 |
323+
| 18 | **Hostname** | Force `Rename-Computer` to `COCOON-VM` (specialize ComputerName unreliable on 25H2) |
324+
| 19 | **virtio-win guest tools** | Silent install `virtio-win-guest-tools.exe /S` from CD-ROM — drivers + QEMU Guest Agent + spice agent in one shot |
325+
| 20-22 | **ACPI power button = Shut down** | `PBUTTONACTION=3` for AC + DC power schemes |
326+
| 23-24 | **Shutdown optimization** | `WaitToKillServiceTimeout=5000`, `DisableShutdownNamedPipeCheck=1` |
327+
| 25 | **Shutdown without logon** | Allow remote `shutdown /s /t 0` with no user logged in |
328+
| 26 | **Install marker** | `cmd /c "echo %date% %time% > C:\install.success"` |
45329

46-
## Why GHCR instead of LFS?
330+
## Post-clone networking
47331

48-
- GitHub LFS has a hard 2 GiB per-file limit; a compressed Windows 11 image is ~14 GiB.
49-
- GHCR supports larger blobs, is free for public repos, and speaks a standard protocol (OCI) that maps naturally to a multi-layer "split archive".
50-
- Consumers can pull without extra credentials on public repos.
332+
- **DHCP**: no action needed, Windows DHCP client auto-configures on the new NIC.
333+
- **Static IP**: configure via SAC serial console:
334+
```
335+
cmd
336+
ch -si 1
337+
netsh interface ip set address "Ethernet" static <IP> <MASK> <GW>
338+
```
339+
See the [Cloud Hypervisor Windows documentation](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/windows.md) for details.
51340

52341
## Licensing
53342

0 commit comments

Comments
 (0)