Summary
I propose to integrate Cross-Layer Equalization (CLE) into our TICO as a preprocessing step to improve quantization performance, especially for activation-aware quantization.
CLE is a technique that rescales weights across consecutive layers (e.g., Linear/Conv pairs) to reduce channel-wise variance imbalance, helping mitigate quantization error without requiring additional data or retraining.
Motivation
Current PTQ workflow relies on calibration to collect activation statistics, but:
- Large inter-channel variance in weights can degrade quantization quality
- Calibration alone may not sufficiently compensate for such imbalance
- CLE can improve quantization robustness with minimal overhead
By introducing CLE:
- Reduce quantization error before calibration
- Improve accuracy for low-bit (e.g., INT8, INT4) quantization
- Provide a data-free optimization step
Summary
I propose to integrate Cross-Layer Equalization (CLE) into our TICO as a preprocessing step to improve quantization performance, especially for activation-aware quantization.
CLE is a technique that rescales weights across consecutive layers (e.g., Linear/Conv pairs) to reduce channel-wise variance imbalance, helping mitigate quantization error without requiring additional data or retraining.
Motivation
Current PTQ workflow relies on calibration to collect activation statistics, but:
By introducing CLE: