Skip to content

Commit b92a511

Browse files
yuejiaointelCopilotclaudeethanglaser
authored
WIP: Enable array api support in neighbor (#2700)
* refactor: move/delete some methods in neighbors.py * fix: try it again * fix: try it again * fix: try it again * fix: first round of refactor move preprocssing function to sklearnex * fix: fix shape * rebase: rebase to main * fix: add fit emthod logic in onedla * fix: fix test * fix: fix tupleerror * fix: fix tuple issue * print: print fit_x * fix: fixed tuple * fix: fix tuple * print: print in save attributes * fix: tuple handling * print: add print * print: test print * test: test fix for typle * fix: more print * fix: test fix for tuyple issue * fix: test fix for tuyple issue * fix: try add validation * fix: try restore neighbors funcitons * fix: test restore * fix: restore again * fix: restpore * fix: restore ad and add print * fix: restore ad and add print * fix: fix test as well * fix: fix test * fix: comment out validate data * fix: refactoredclassifier prepressing to sklearnex * fix: add vlaidate data and see if it fix attributeerror * fix: fix onedal test * fix: dpm * fix: refacto validate n classes * fix: refacor kneighbors validation * fix: add vlaidation data to rest of the functions * fix: fix check n neighbors validation before check is fitted * fix: fix when predict(none) is called by adding x is not none check * fix: fix lof * fix: add validation in kneihbors for lof * fix: remove count valitation in onedal * fix: refactor shape * refactor: neighbors processing logic to skleranex * fix: validationeighbors < samples after +1 * fix: fix assertion error * fix: fix asswertion error by dispatch gpu/skl in sklearnex * refacor: onedal prediciton entirely to sklearnex * feature: array api in common.py * fix: assertion error * feature: add array api support to knn skleranex files * fix: compatiibilty for array api * fix: remove validate data tests from deseleted tests * fix: format * fix: remove ensure finite and reformat * fix: format * fix: fix patching type error * fix: update doc * fix: fix patching error * fix: attribute error * fix: patchnig AttributeError * fix: remove print and commented code * fix: format * fix: fix conformance test * fix: format * fix: clean up unneeded var * fix: attributeerror * fix: spmd also use skelarnex neighbors * test: test without classes_check in onedal neighbor * fix: spmd issue * fix: format * fix: make sure y is numeric in regrresor * fix: fix spmd test * fix: common tests * fix: spmd issues * fix: format * fix: fix metric value * fix: stability test * fix: test * fix: fix patching error * fix: spmd preduct * fix: validate y for regressor * test: try regressor without ynumric but verify it ouside validate dat * fix: foloow ridge patten ensure y numberic requrie ksnearln >=1.5 * fix: test without manual convertion * fix: add violation back * fix: add rst back * fix: add the dunmmy check back * fix: fix the reshape * fix: add post processing to onedal * fix: kneighbors * fix: remove post prossing from skelarnex * fix: format * fix: format * fix: fix spmd * fix: fix spmd test * fix: format * fix: lof tests * fix: format * fix: fix example test * fix: fix spmd neighbor again with runtime lookup * fix: spmd synetic test * fix: lof case * fix: format * fix: address past comments again except refactoring * fix: dtype * fix: fix check onedal estimator * fix: remove unused function and refactoring * fix: change to hasattr * fix: bring some funcitons back * fix: format * fix: remove comments * fix: performance * fix: add validation * fix: validate data * fix: validate data * fix: validate data * fix: validate data * fix: validate data * fix: validate data * fix validate * fix: performnace * fix: validate data * fix: performance * fix: performance and add spmd array api dispatch flag * fix: gpu errors * fix: comments * fix: gpu * fix: fix spmd * fix: added onedal device offload changes in #2940 * fix: comments * fix: comments * fix: get array from namespace for gpu data * fix: comments * fix: device offload with standalone functions to hanlde converstion * revert: restore onedal/_device_offload.py to main (defer to #2940) * fix: add multi output true * fix: let non sycl arrays gpu input fall back to sklearn * fix: simpliy unique_inverse * fix: reverse as numpy < 2.0 does not have np.unqieu_inverse * Fix unique_inverse result unpacking for Python 3.13 compatibility numpy.unique_inverse returns a plain tuple on some numpy/Python version combinations (e.g., Python 3.13), not a namedtuple with .values/.inverse_indices attributes. Use tuple unpacking which works for both tuple and namedtuple return types. * fix: allow numpy arrays in _onedal_gpu_supported for dpctl/dpnp path The previous check rejected all non-SYCL data, but dispatch() transfers dpctl/dpnp data to host (numpy) before the second _get_backend call. This caused dpctl/dpnp GPU tests to incorrectly fall back to sklearn. Fix: skip numpy arrays in the check (they are always valid), only reject non-numpy arrays that lack __sycl_usm_array_interface__ (e.g. torch XPU tensors). * fix: ondal support * fix: gpu test * fix: format * fix: comments * fix: gpu test * fix: revert changes in device offload * fix: lof * fix: lof output type and gpu test * fix: convert predict/predict_proba output to input namespace for torch KNN regressor predict and classifier predict_proba returned numpy for torch input because wrap_output_data cannot handle torch tensors (torch lacks __array_namespace__). Added xp.asarray() conversion at the predict/predict_proba level, consistent with LOF score_samples. * fix: remove transfer to host in common and use xp.ones * fix: int type * fix: fix len * fix: validation * fix: len in common * fix: len * fix: patching test * Fix Array API: replace .ravel() with xp.reshape for array_api_strict compatibility * fix: format * fix: array API compliant predict_proba in _compute_class_probabilities * Update sklearnex/neighbors/knn_unsupervised.py delete comment Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: use method-form astype for numpy compat (numpy < 2.0) * fix: move get_namespace to onedal to avoid inverted dependency on sklearnex * Revert "fix: move get_namespace to onedal to avoid inverted dependency on sklearnex" This reverts commit 4a0defc. * fix: convert to numpy for cpu input before calling onedal, got rid of get_namespace in onedal with get_sycl_namespace * fix: also convert to numpy for y * fix: pathcing * fix: prob_k * fix: move all post porcessing from onedal back to sklearnx * fix: test * fix: tests * FIX: Skip available_if gated methods with default params in test_common gen_models_info (from PR #2955) now discovers available_if-gated methods via fn unwrapping. These get parametrized as test cases, but estimator_trace creates estimators with default parameters where those methods aren't available (e.g. SVC with probability=False has no predict_proba). Added check_is_dynamic_method skip guard in estimator_trace, matching the pattern in test_patching.py and test_run_to_run_stability.py. * FIX: Exclude 'self' from argnum in call_method for sklearn 1.0 compat In sklearn 1.0.x, _AvailableIfDescriptor.__get__ returns a lambda with update_wrapper, causing signature() to follow __wrapped__ to the unbound function (self, X). This results in argnum=2 instead of 1, leading to call_method passing both X and y to methods that only accept X (like score_samples, predict_proba, decision_function). The fix excludes 'self' from the argnum count since call_method always operates on bound methods where 'self' is already bound. * FIX: Add xfail entries for LOF validate_data tests newly discovered by gen_models_info LOF fit_predict, score_samples, predict, and decision_function are newly discovered by gen_models_info after PR #2955. These methods don't follow the dual validate_data pattern (sklearnex + sklearn) expected by call_validate_data, consistent with other neighbors validate_data xfails. * FIX: Skip available_if gated methods with default params in test_common gen_models_info (from PR #2955) now discovers available_if-gated methods via fn unwrapping. These get parametrized as test cases, but estimator_trace creates estimators with default parameters where those methods aren't available (e.g. SVC with probability=False has no predict_proba). Added check_is_dynamic_method skip guard in estimator_trace, matching the pattern in test_patching.py and test_run_to_run_stability.py. * FIX: Exclude 'self' from argnum in call_method for sklearn 1.0 compat In sklearn 1.0.x, _AvailableIfDescriptor.__get__ returns a lambda with update_wrapper, causing signature() to follow __wrapped__ to the unbound function (self, X). This results in argnum=2 instead of 1, leading to call_method passing both X and y to methods that only accept X (like score_samples, predict_proba, decision_function). The fix excludes 'self' from the argnum count since call_method always operates on bound methods where 'self' is already bound. * Move available_if filtering from estimator_trace to gen_models_info Address review: filter out unavailable dynamic methods (e.g., SVC.predict_proba when probability=False) at test collection time in gen_models_info rather than skipping at test runtime in estimator_trace. These conditions are deterministic at instantiation, not dependent on fit. * fixL input issue * fix: lof * fix: raise error for predict proba in spmd mode * Skip LOF score_samples in stability test (non-deterministic kd_tree tie-breaking) * fix: fix lof test * Deselect RidgeClassifier array_api_strict tests (sklearn bug: Array passed to xp.full fill_value) * Re-enable Ridge deselections with >=1.8,<1.9 constraint; add LOF fit_predict to stability skip * Simplify Ridge deselection constraint to >=1.8 * Add LOF stability skip to test_standard_estimator_stability * fix: remove extra type convertion do all the type convertion in warp_ouput+data * fix: patching and format * fix: pass n_neighbors as keyword to avoid _transfer_to_host USM/int mismatch * Fix wrap_output_data for torch XPU and add _onedal_dispatched guard - Replace __array_namespace__ check with get_namespace() to detect torch tensors - Add _onedal_dispatched check to skip conversion on sklearn fallback - Add __dlpack__ check in _asarray for torch XPU zero-copy conversion - Add object-dtype early return in _asarray for ragged radius_neighbors results - Update _kneighbors_postprocess sycl_queue handling for SPMD - Update docs for array API support * Fix doc review comments: use :meth: links, list methods explicitly, remove redundant kd_tree lines * fix: comments * address review: fix dpctl reshape, format * fix: comments * fix: comment * Simplify _asarray check: use __dlpack__ instead of __array_namespace__ __dlpack__ is mandated by the array API standard and is supported by all array types we handle (numpy, dpnp, torch cpu/xpu), making the __array_namespace__ check redundant. * fix: replace .reshape(-1) with [:, 0] in SPMD neighbors predict dpctl tensors don't have .reshape() method. Use [:, 0] indexing to flatten 2D responses to 1D, matching the pattern used in DBSCAN and KMeans for array API compatibility. * fix: address David's review feedback on neighbors - Fix test_pickle: use dataframes/queues properly, remove broad GPU skip - Remove test_knn_dtype_preservation (covered by PR #2971) - Remove PR reference from comment, trim redundant comments - Use xp.concat directly (numpy >=2 supports it, dpctl no longer supported) * Remove unnecessary comments in common.py * fix: use concat with concatenate fallback for numpy < 2 * Remove unnecessary comment in _asarray * Remove test_pickle: dpnp/dpctl arrays cannot be pickled * Restore useful comments in common.py * Fix dpctl concat AttributeError and add torch XPU pickle test * Add docstring to SPMD predict_proba * fix: comment * Simplify predict_proba docstring * fix: use csr_array and _align_api_if_sparse in kneighbors_graph Use csr_array instead of deprecated csr_matrix and apply _align_api_if_sparse for sklearn 1.9+ sparse interface config support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fallback to csr_matrix for older scipy without csr_array Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing _set_effective_metric call in NearestNeighbors._onedal_fit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove duplicate _set_effective_metric calls * Remove resolved output_dtype skips for KNN, add dtype guard in wrap_output_data * Add LOF fit_predict to output_dtype skip: returns int labels * Add LOF to pickle exclusion list for dpnp/dpctl * Restore predict_proba skip, guard wrap_output_data with _array_api_offload * Fix LOF _predict: get namespace from result not input * Fix LOF negative_outlier_factor_ attr type, remove resolved attr skips * Add comments for remaining KNN/LOF skip entries * Use from_table(like=X) in oneDAL KNN, add tuple guard in wrap_output_data * Fix LOF _predict queue mismatch: create is_inlier on same sycl_queue * Replace __array_namespace__ with get_namespace in wrap_output_data * Remove predict_proba output_dtype skip: dtype now preserved * Handle int dtype in tuple results, remove KNN/LOF output_dtype skips * Fix comment: remove multi-output predict_proba reference * Update sklearnex/neighbors/knn_classification.py Co-authored-by: ethanglaser <42726565+ethanglaser@users.noreply.github.com> * Fix array-api-strict read-only array errors in kd_tree sorting and LOF predict Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Call _set_effective_metric in _onedal_fit for SPMD use_raw_input path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Simplify _process_classification_targets: remove redundant getattr guard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove unreachable n_classes int64 overflow check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: ethanglaser <42726565+ethanglaser@users.noreply.github.com>
1 parent 501426b commit b92a511

18 files changed

Lines changed: 1707 additions & 1090 deletions

doc/sources/algorithms.rst

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,11 @@ Classification
404404
- ``metric`` not in [``'euclidean'``, ``'manhattan'``, ``'minkowski'``, ``'chebyshev'``, ``'cosine'``]
405405
- Only dense data is supported.
406406
- Number of classes must be at least 2.
407+
The following methods are not accelerated by |sklearnex| and will
408+
fall back to |sklearn| on CPU, returning NumPy arrays when using
409+
array API inputs:
410+
:meth:`~sklearn.neighbors.KNeighborsClassifier.radius_neighbors`,
411+
:meth:`~sklearn.neighbors.KNeighborsClassifier.radius_neighbors_graph`.
407412
* - :obj:`sklearn.linear_model.LogisticRegression`
408413
- All parameters are supported except:
409414

@@ -457,8 +462,13 @@ Regression
457462

458463
- ``algorithm`` != ``'brute'``
459464
- ``weights`` = ``'callable'``
460-
- ``metric`` != ``'euclidean'`` or ``'minkowski'`` with ``p`` != ``2``
461-
- Only dense data is supported
465+
- ``metric`` not in [``'euclidean'``, ``'manhattan'``, ``'minkowski'``, ``'chebyshev'``, ``'cosine'``]
466+
- Only dense data is supported.
467+
The following methods are not accelerated by |sklearnex| and will
468+
fall back to |sklearn| on CPU, returning NumPy arrays when using
469+
array API inputs:
470+
:meth:`~sklearn.neighbors.KNeighborsRegressor.radius_neighbors`,
471+
:meth:`~sklearn.neighbors.KNeighborsRegressor.radius_neighbors_graph`.
462472
* - :obj:`sklearn.linear_model.Ridge`
463473
- All parameters are supported except:
464474

@@ -546,7 +556,6 @@ Anomaly Detection
546556
- All parameters are supported except:
547557

548558
- ``algorithm`` != ``'brute'``
549-
- ``weights`` = ``'callable'``
550559
- ``metric`` not in [``'euclidean'``, ``'manhattan'``, ``'minkowski'``, ``'chebyshev'``, ``'cosine'``]
551560
- Only dense data is supported
552561
- If using :doc:`target_offload <config-contexts>`, some computations outside of neighbor calculations (related to thresholds for outlierness) might happen on CPU.
@@ -566,9 +575,19 @@ Nearest Neighbors
566575
- All parameters are supported except:
567576

568577
- ``algorithm`` != ``'brute'``
569-
- ``weights`` = ``'callable'``
570578
- ``metric`` not in [``'euclidean'``, ``'manhattan'``, ``'minkowski'``, ``'chebyshev'``, ``'cosine'``]
571-
- Only dense data is supported
579+
- Only dense data is supported.
580+
The following methods are not accelerated by |sklearnex| and will
581+
fall back to |sklearn| on CPU, returning NumPy arrays when using
582+
array API inputs:
583+
:meth:`~sklearn.neighbors.NearestNeighbors.radius_neighbors`,
584+
:meth:`~sklearn.neighbors.NearestNeighbors.radius_neighbors_graph`.
585+
* - :obj:`sklearn.neighbors.LocalOutlierFactor`
586+
- All parameters are supported except:
587+
588+
- ``algorithm`` != ``'brute'``
589+
- ``metric`` not in [``'euclidean'``, ``'manhattan'``, ``'minkowski'``, ``'chebyshev'``, ``'cosine'``]
590+
- Only dense data is supported.
572591

573592
Other Tasks
574593
***********
@@ -780,7 +799,6 @@ Nearest Neighbors
780799
- All parameters are supported except:
781800

782801
- ``algorithm`` != `'brute'`
783-
- ``weights`` = `'callable'`
784802
- ``metric`` not in [`'euclidean'`, `'manhattan'`, `'minkowski'`, `'chebyshev'`, `'cosine'`]
785803
- Only dense data is supported
786804

doc/sources/array_api.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,10 @@ The following patched classes have support for array API inputs:
105105
- :obj:`sklearn.linear_model.Ridge`
106106
- :obj:`sklearnex.linear_model.IncrementalLinearRegression`
107107
- :obj:`sklearnex.linear_model.IncrementalRidge`
108+
- :obj:`sklearn.neighbors.KNeighborsClassifier`
109+
- :obj:`sklearn.neighbors.KNeighborsRegressor`
110+
- :obj:`sklearn.neighbors.NearestNeighbors`
111+
- :obj:`sklearn.neighbors.LocalOutlierFactor`
108112
- :obj:`sklearn.svm.NuSVC`
109113
- :obj:`sklearn.svm.NuSVR`
110114
- :obj:`sklearn.svm.SVC`
@@ -149,6 +153,14 @@ that was fitted to array API inputs will work, but it will do so by transferring
149153
to host if not already there, passing the intermediate object to |sklearn|, and outputting
150154
a host NumPy array, with some exceptions where |dpnp_array| classes might be returned.
151155

156+
Similarly, :obj:`sklearn.neighbors.KNeighborsClassifier` also offers methods such as
157+
:meth:`~sklearn.neighbors.KNeighborsClassifier.radius_neighbors` and
158+
:meth:`~sklearn.neighbors.KNeighborsClassifier.kneighbors_graph`, which do not have
159+
accelerated analogs in the |sklearnex| and thus rely on |sklearn| for the computations.
160+
Calling such methods from a KNN estimator from the |sklearnex| that was fitted to array
161+
API inputs will work, but it will do so by transferring the data to host if not already
162+
there, passing the intermediate object to |sklearn|, and outputting a host NumPy array.
163+
152164
Note that some cases of estimator-specific methods are still fully array API compatible -
153165
for example, :meth:`sklearn.neighbors.NearestNeighbors.kneighbors` will produce outputs
154166
of array API classes when fitted to them.

0 commit comments

Comments
 (0)