Commit e617209
committed
fix: Increase HNSW neighbor diversity factor from 0.1 to 0.7 for high-dimensional spaces
The neighbor selection heuristic was using DIVERSITY_FACTOR = 0.1, which is
too lenient for high-dimensional embeddings (768D). This caused dense local
clustering but poor long-range connectivity, resulting in low recall.
Root Cause:
- 0.1 factor allows neighbors to be 90% closer together than query distance
- In 768D space (curse of dimensionality), points are uniformly distributed
- Dense local clusters form, but inter-cluster bridges are weak
- Result: Poor recall despite high ef_construction
Fix:
- Increase DIVERSITY_FACTOR from 0.1 to 0.7 (7x stricter)
- Neighbors must now be ≥70% of query distance apart from each other
- Ensures angular diversity and long-range graph connectivity
- Standard HNSW papers recommend 0.5-1.0 for high-dimensional spaces
Expected Impact:
- Better recall on Wikidata's 50M × 768D embeddings
- More balanced graph structure (less local clustering)
- Slight increase in index build time (more candidates rejected)
This is the PRIMARY fix for Wikidata recall issues - ef_construction alone
cannot compensate for poor neighbor selection heuristics.1 parent c6984b7 commit e617209
3 files changed
Lines changed: 21 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
12 | | - | |
| 11 | + | |
| 12 | + | |
13 | 13 | | |
14 | | - | |
15 | | - | |
| 14 | + | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1342 | 1342 | | |
1343 | 1343 | | |
1344 | 1344 | | |
1345 | | - | |
1346 | | - | |
1347 | | - | |
1348 | | - | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
1349 | 1350 | | |
1350 | 1351 | | |
1351 | 1352 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
852 | 852 | | |
853 | 853 | | |
854 | 854 | | |
| 855 | + | |
855 | 856 | | |
856 | 857 | | |
857 | 858 | | |
858 | 859 | | |
859 | | - | |
860 | | - | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
861 | 866 | | |
862 | 867 | | |
863 | 868 | | |
| |||
885 | 890 | | |
886 | 891 | | |
887 | 892 | | |
888 | | - | |
889 | | - | |
890 | | - | |
891 | | - | |
892 | | - | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
893 | 898 | | |
894 | 899 | | |
895 | 900 | | |
| |||
0 commit comments