Skip to content

Commit 338ae62

Browse files
authored
Improve health framework for interop with CSSTools (#568)
# Description This pull request enhances the SDN Health diagnostics module by improving the clarity and usefulness of health test outputs, introducing a new "UNKNOWN" result state, and refining how properties and remediation steps are reported. It also updates several test descriptions and result severities for better guidance. **Health Test Output Improvements:** * Added `FriendlyName`, `Impact`, `Description`, and `PublicDocs` fields to health test objects for more informative reporting (`New-SdnHealthTest`, `SdnDiag.Health.psm1`). * Enhanced `Write-HealthValidationInfo` to display detailed properties and improved remediation formatting. (F4614ca0L293R336) **Result State Enhancements:** * Introduced a new `UNKNOWN` result state throughout the reporting pipeline for cases where test results are indeterminate (e.g., unhandled exceptions). Updated all relevant functions and output color coding to support this new state. [[1]](diffhunk://#diff-15898640fc68e07afa836ad8d93af4f22a4442978d9c233f39d48d44d85cfb60L245-R251) [[2]](diffhunk://#diff-15898640fc68e07afa836ad8d93af4f22a4442978d9c233f39d48d44d85cfb60L262-R268) [[3]](diffhunk://#diff-15898640fc68e07afa836ad8d93af4f22a4442978d9c233f39d48d44d85cfb60R322-R325) [[4]](diffhunk://#diff-15898640fc68e07afa836ad8d93af4f22a4442978d9c233f39d48d44d85cfb60R636) [[5]](diffhunk://#diff-15898640fc68e07afa836ad8d93af4f22a4442978d9c233f39d48d44d85cfb60L637-R665) **Test Logic and Severity Adjustments:** * Changed the severity of certain test failures to `WARNING` instead of `FAIL` for less critical issues (e.g., non-self-signed certificates in Trusted Root, disabled diagnostics cleanup task). (F951R973, F1033R1060) * Updated exception handling in test functions to set the result to `UNKNOWN` rather than `FAIL` for clearer diagnostics. (F966R985, F995R1014, F1045R1072) **Test Metadata and Documentation:** * Updated test entries in `SdnDiag.Health.Config.psd1` to include `FriendlyName` and improved `Description` fields for clearer and more actionable health check documentation. **Remediation and Property Reporting:** * Standardized remediation output formatting and added structured property reporting for failed health tests. [[1]](diffhunk://#diff-15898640fc68e07afa836ad8d93af4f22a4442978d9c233f39d48d44d85cfb60L326-R353) [[2]](diffhunk://#diff-15898640fc68e07afa836ad8d93af4f22a4442978d9c233f39d48d44d85cfb60L637-R665) # Change type - [ ] Bug fix (non-breaking change) - [ ] Code style update (formatting, local variables) - [x] New Feature (non-breaking change that adds new functionality without impacting existing) - [ ] Breaking change (fix or feature that may cause functionality impact) - [ ] Other # Checklist: - [x] My code follows the style and contribution guidelines of this project. - [x] I have tested and validated my code changes.
1 parent 48ab448 commit 338ae62

2 files changed

Lines changed: 161 additions & 88 deletions

File tree

src/modules/SdnDiag.Health.Config.psd1

Lines changed: 48 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -7,134 +7,158 @@
77
# COMMON TESTS
88

99
'Test-SdnDiagnosticsCleanupTaskEnabled' = @{
10-
Description = "Scheduled task is not enabled on the SDN infrastructure node(s)."
10+
FriendlyName = "SDN Diagnostics Task Enabled"
11+
Description = "Ensure that scheduled task is enabled to automatically clean up and prune diagnostic log files."
1112
Impact = "Unconstrained log files may grow and consume disk space."
1213
PublicDocUrl = ""
1314
}
1415
'Test-SdnNetworkControllerApiNameResolution' = @{
15-
Description = "Network Controller URL is not resolvable."
16+
FriendlyName = "Network Controller API Name Resolution"
17+
Description = "Ensure that the Network Controller URL is resolvable."
1618
Impact = "Calls to Network Controller API will fail resulting in policy configuration failures and unable to manage SDN resources."
1719
PublicDocUrl = ""
1820
}
1921
'Test-SdnNonSelfSignedCertificateInTrustedRootStore' = @{
20-
Description = "Non Root Cert exist in Host Trusted Root CA Store"
22+
FriendlyName = "Non-Self-Signed Certificate in Trusted Root Store"
23+
Description = "Ensure that only trusted root certificates exist in the host Trusted Root CA store."
2124
Impact = "Network Controller will have issues communicating to hosts resulting in policy configuration failures."
2225
PublicDocUrl = "https://learn.microsoft.com/en-us/troubleshoot/developer/webapps/iis/site-behavior-performance/http-403-forbidden-access-website#cause-2-non-self-signed-certificates-are-in-trusted-root-certification-authorities-certificate-store"
2326
}
2427
'Test-SdnServiceState' = @{
25-
Description = "Identified service(s) are not running on the SDN infrastructure node(s)."
28+
FriendlyName = "SDN Service State"
29+
Description = "Ensure that required services are running on the SDN infrastructure node(s)."
2630
Impact = "SDN services and functionality will be impacted without the service running."
2731
PublicDocUrl = ""
2832
}
2933
'Test-SdnCertificateExpired' = @{
30-
Description = "SDN infrastructure node certificate is expired."
34+
FriendlyName = "SDN Certificate Expired"
35+
Description = "Ensure that SDN infrastructure node certificates are valid and not expired."
3136
Impact = "Network Controller may have issues communicating and programming policies to SDN infrastructure nodes resulting in impact to workloads and services."
3237
PublicDocUrl = "https://learn.microsoft.com/en-us/azure/azure-local/manage/update-sdn-infrastructure-certificates"
3338
}
3439
'Test-SdnCertificateMultiple' = @{
35-
Description = "Multiple certificates with the same subject name and NetworkController OID exist in the SDN infrastructure node's certificate store."
40+
FriendlyName = "SDN Certificate Multiple"
41+
Description = "Ensure that only one certificate with the same subject name and NetworkController OID exists in the SDN infrastructure node certificate store."
3642
Impact = "Network Controller may have issues communicating and programming policies to SDN infrastructure nodes resulting in impact to workloads and services."
3743
PublicDocUrl = "https://learn.microsoft.com/en-us/azure/azure-local/manage/update-sdn-infrastructure-certificates"
3844
}
3945

4046
# GATEWAY TESTS
4147

4248
'Test-SdnAdapterPerformanceSetting' = @{
43-
Description = "Network Adapter performance settings are not configured as recommended on the SDN node(s)."
49+
FriendlyName = "SDN Adapter Performance Setting"
50+
Description = "Ensure that network adapter performance settings are configured as recommended on the SDN node(s)."
4451
Impact = "You may not achieve optimal performance for network traffic flowing through the SDN Node(s)."
4552
PublicDocUrl = ""
4653
}
4754

4855
# LOAD BALANCER MUX TESTS
4956

5057
'Test-SdnMuxConnectionStateToRouter' = @{
51-
Description = "One or more Load Balancer Muxes do not have an active BGP connection via TCP port 179 to the switch."
58+
FriendlyName = "SDN Mux Connection State to Router"
59+
Description = "Ensure that each Load Balancer Mux has an active BGP connection to the switch over TCP port 179."
5260
Impact = "Public IP addresses may not be routable as Load Balancer Muxes are not advertising the public IP addresses to the switch."
5361
PublicDocUrl = "https://learn.microsoft.com/en-us/azure-stack/hci/manage/troubleshoot-software-load-balancer"
5462
}
5563
'Test-SdnMuxConnectionStateToSlbManager' = @{
56-
Description = "SLB Manager does not have connectivity established to Mux(es) via TCP 8560."
64+
FriendlyName = "SDN Mux Connection State to SLB Manager"
65+
Description = "Ensure that SLB Manager connectivity to Mux(es) over TCP port 8560 is established."
5766
Impact = "SLB Manager will not be able to program VIP:DIP mappings to the Load Balancer Mux(es) which will impact routing of Virtual IPs."
5867
PublicDocUrl = "https://learn.microsoft.com/en-us/azure-stack/hci/manage/troubleshoot-software-load-balancer"
5968
}
6069

6170
# NETWORK CONTROLLER TESTS
6271

6372
'Test-SdnNetworkControllerNodeRestInterface' = @{
64-
Description = "Network Controller node(s) are missing the Network Adapter that is required for the REST interface."
73+
FriendlyName = "Network Controller Node REST Interface"
74+
Description = "Ensure that Network Controller node(s) have the required network adapter for the REST interface."
6575
Impact = "Failover of the NB API will not occur if the Network Controller node(s) are missing the Network Adapter that is required for the REST interface."
6676
PublicDocUrl = "https://learn.microsoft.com/en-us/powershell/module/networkcontroller/set-networkcontrollernode"
6777
}
6878
'Test-SdnServiceFabricApplicationHealth' = @{
69-
Description = "Network Controller application with Service Fabric is not healthy."
79+
FriendlyName = "Network Controller Service Fabric Application Health"
80+
Description = "Ensure that the Network Controller Service Fabric application is healthy."
7081
Impact = "Network Controller services and functionality may be impacted."
7182
PublicDocUrl = ""
7283
}
7384
'Test-SdnServiceFabricClusterHealth' = @{
74-
Description = "Service Fabric cluster for Network Controller is not healthy."
85+
FriendlyName = "Network Controller Service Fabric Cluster Health"
86+
Description = "Ensure that the Service Fabric cluster for Network Controller is healthy."
7587
Impact = "Network Controller services and functionality may be impacted."
7688
PublicDocUrl = ""
7789
}
7890
'Test-SdnServiceFabricNodeStatus' = @{
79-
Description = "Service Fabric node(s) are offline and not participating in the cluster."
91+
FriendlyName = "Network Controller Service Fabric Node Status"
92+
Description = "Ensure that Service Fabric node(s) are online and participating in the cluster."
8093
Impact = "Minimum amount of nodes are required to maintain quorum and cluster availability. Services will be in read-only state if quorum is lost and may result in data loss."
8194
PublicDocUrl = "https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-disaster-recovery"
8295
}
8396
'Test-SdnResourceConfigurationState' = @{
84-
Description = "Infrastructure resource configuration is not Success."
97+
FriendlyName = "SDN Resource Configuration State"
98+
Description = "Ensure that infrastructure resources report configuration state as Success."
8599
Impact = "SDN services and functionality may be impacted."
86100
PublicDocUrl = "https://learn.microsoft.com/en-us/windows-server/networking/sdn/troubleshoot/troubleshoot-windows-server-software-defined-networking-stack#hoster-validate-system-health"
87101
}
88102
'Test-SdnResourceProvisioningState' = @{
89-
Description = "Infrastructure resource provisioning is not Succeeded."
103+
FriendlyName = "SDN Resource Provisioning State"
104+
Description = "Ensure that infrastructure resources report provisioning state as Succeeded."
90105
Impact = "SDN services and functionality may be impacted."
91106
PublicDocUrl = "https://learn.microsoft.com/en-us/windows-server/networking/sdn/troubleshoot/troubleshoot-windows-server-software-defined-networking-stack#hoster-validate-system-health"
92107
}
93108
'Test-NetworkInterfaceAPIDuplicateMacAddress' = @{
94-
Description = "Duplicate MAC address detected within the API."
109+
FriendlyName = "Network Interface API Duplicate MAC Address"
110+
Description = "Ensure that MAC addresses are unique for network interfaces in the API."
95111
Impact = "Policy configuration failures may be reported by Network Controller when applying policies to the Hyper-v host. Network Interfaces reporting configurationState failure will not be routable."
96112
PublicDocUrl = ""
97113
}
98114

99115
# SERVER TESTS
100116

101117
'Test-SdnEncapOverhead' = @{
102-
Description = "EncapOverhead/JumboPacket is not configured properly on the Hyper-V Hosts"
118+
FriendlyName = "SDN Encap Overhead"
119+
Description = "Ensure that EncapOverhead and JumboPacket values are configured correctly on Hyper-V hosts."
103120
Impact = "Intermittent packet loss may occur under certain conditions when routing traffic within the logical network."
104121
PublicDocUrl = "https://learn.microsoft.com/en-us/windows-server/networking/sdn/troubleshoot/troubleshoot-windows-server-software-defined-networking-stack#check-mtu-and-jumbo-frame-support-on-hnv-provider-logical-network"
105122
}
106123
'Test-SdnHostAgentConnectionStateToApiService' = @{
107-
Description = "Network Controller Host Agent is not connected to the Network Controller API Service."
124+
FriendlyName = "SDN Host Agent Connection State to API Service"
125+
Description = "Ensure that the Network Controller Host Agent is connected to the Network Controller API service."
108126
Impact = "Policy configuration may not be pushed to the Hyper-V host(s) if no southbound connectivity is available."
109127
PublicDocUrl = ""
110128
}
111129
'Test-SdnProviderNetwork' = @{
112-
Description = "Logical network does not support VXLAN or NVGRE encapsulated traffic"
130+
FriendlyName = "SDN Provider Network"
131+
Description = "Ensure that the logical network supports VXLAN or NVGRE encapsulated traffic."
113132
Impact = "Intermittent packet loss may occur under certain conditions when routing traffic within the logical network."
114133
PublicDocUrl = "https://learn.microsoft.com/en-us/windows-server/networking/sdn/troubleshoot/troubleshoot-windows-server-software-defined-networking-stack#check-mtu-and-jumbo-frame-support-on-hnv-provider-logical-network"
115134
}
116135
'Test-VfpDuplicateMacAddress' = @{
117-
Description = "Duplicate MAC address detected within Virtual Filtering Platform (VFP)."
136+
FriendlyName = "VFP Duplicate MAC Address"
137+
Description = "Ensure that MAC addresses are unique within Virtual Filtering Platform (VFP)."
118138
Impact = "Policy configuration failures may be reported by Network Controller when applying policies to the Hyper-v host. In addition, network traffic may be impacted."
119139
PublicDocUrl = ""
120140
}
121141
'Test-SdnVfpEnabledVMSwitch' = @{
122-
Description = "No VMSwitches detected with VFP enabled on the Hyper-V host(s)."
142+
FriendlyName = "SDN VFP Enabled VMSwitch"
143+
Description = "Ensure that at least one VMSwitch with VFP enabled is present on Hyper-V host(s)."
123144
Impact = "Policy configuration failures may be reported by Network Controller when applying policies to the Hyper-v host."
124145
PublicDocUrl = ""
125146
}
126147
'Test-SdnVfpEnabledVMSwitchMultiple' = @{
127-
Description = "Multiple VFP enabled virtual switches detected on the Hyper-V host(s)."
148+
FriendlyName = "SDN VFP Enabled VMSwitch Multiple"
149+
Description = "Ensure that only one VFP-enabled virtual switch is present on Hyper-V host(s)."
128150
Impact = "Policy configuration failures may be reported by Network Controller when applying policies to the Hyper-v host."
129151
PublicDocUrl = ""
130152
}
131153
'Test-VMNetAdapterDuplicateMacAddress' = @{
132-
Description = "Duplicate MAC address detected with the data plane on the Hyper-V host(s)."
154+
FriendlyName = "VM Network Adapter Duplicate MAC Address"
155+
Description = "Ensure that data-plane VM network adapter MAC addresses are unique on Hyper-V host(s)."
133156
Impact = "Policy configuration failures may be reported by Network Controller when applying policies to the Hyper-v host. In addition, network traffic may be impacted for the interfaces that are duplicated."
134157
PublicDocUrl = ""
135158
}
136159
'Test-ServerHostId' = @{
137-
Description = "HostID is not configured properly on the Hyper-V Hosts"
160+
FriendlyName = "Server Host ID"
161+
Description = "Ensure that HostID is configured correctly on Hyper-V hosts."
138162
Impact = "Mismatch of HostId between Hyper-V host(s) and Network Controller will result in policy configuration failures."
139163
PublicDocUrl = "https://learn.microsoft.com/en-us/windows-server/networking/sdn/troubleshoot/troubleshoot-windows-server-software-defined-networking-stack#check-for-corresponding-hostids-and-certificates-between-network-controller-and-each-hyper-v-host"
140164
}

0 commit comments

Comments
 (0)