-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Simplify DaskScikitLearnBase.predict #11411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -1,4 +1,4 @@ | |||
## Update the following line to test changes to CI images | |||
## See https://xgboost.readthedocs.io/en/latest/contrib/ci.html#making-changes-to-ci-containers | |||
|
|||
IMAGE_TAG=main | |||
IMAGE_TAG=PR-14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using CI images from dmlc/xgboost-devops#14 to test the PR with latest Rapids and Dask
Linting issues should be fixed now. |
I forgot to mention, there's probably a bunch of code that can be cleaned up if we go this route. I haven't attempted to do that yet: just trying to get the tests passing. |
iteration_range=iteration_range, | ||
) | ||
|
||
is_regression = isinstance(self, XGBRegressorBase) | ||
|
||
meta: numpy.typing.NDArray[numpy.float32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta
here could be a cupy.ndarray too. Haven't looked into how to handle that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type of Booster.predict
is np.ndarray
, so using np.ndarray
here matches that. However, that's just the static type annotation. At runtime, IIUC, we could get a cupy array here.
I think we'll need to inspect the type of X._meta
and figure out the return type based on that.
Getting a Dask error:
https://github.com/dmlc/xgboost/actions/runs/14502244978/job/40686317395?pr=11411 |
I'll get a GPU env set up to reproduce that locally. @hcho3 do you know if the package versions (specifically cudf and dask-cudf) are printed out anywhere for that job? @rjzamora, I know we saw that "Cannot fuse tasks with multiple outputs" error in rapidsai/dask-upstream-testing#37. Do you remember whether rapidsai/cudf#18382 fixed that? |
@TomAugspurger The environment can be found at https://github.com/dmlc/xgboost-devops/actions/runs/14502381551/job/40684836721?pr=14. From the build log of the CI container:
|
Yes, hopefully cudf-25.06 includes the necessary fix for that error. |
I'm having some trouble getting a GPU environment locally :/ Is there an easy way to test with cudf-25.06 in CI? I suspect that everything will work with that. I did
I fixed (maybe? I'm a C++ novice) that locally with this diff diff --git a/src/data/ellpack_page.cu b/src/data/ellpack_page.cu
index c3926cff1..6299f9f86 100644
--- a/src/data/ellpack_page.cu
+++ b/src/data/ellpack_page.cu
@@ -31,7 +31,7 @@ EllpackPage::EllpackPage() : impl_{new EllpackPageImpl{}} {}
EllpackPage::EllpackPage(Context const* ctx, DMatrix* dmat, const BatchParam& param)
: impl_{new EllpackPageImpl{ctx, dmat, param}} {}
-EllpackPage::~EllpackPage() noexcept(false) = default;
+EllpackPage::~EllpackPage() noexcept(false) {}
EllpackPage::EllpackPage(EllpackPage&& that) { std::swap(impl_, that.impl_); } Then
worked, but it unexpectedly(?) downloaded
And the test fails with
|
I remember an error like the one you posted and concluded it was a compiler bug, but I couldn't remember which compiler it was. For now, please keep the workaround local. |
For xgboost/python-package/xgboost/dask/__init__.py Line 1493 in 9dfc93b
The
The Dask interface should be using the xgboost/python-package/xgboost/sklearn.py Line 1264 in 9dfc93b
|
@trivialfisjust just confirming: you would expect that And I haven't been able to get a CUDA build working, still hitting that issue with NCCL. If anyone is able to, I think that failing test will pass with any recent cudf / libcudf / pylibcudf from https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ and this branch. |
Mmm this is complicated by |
I probably don't have enough context to suggest a good path forward here, but I think that ideally the
But I'm probably missing some important reasons for the way things are setup like they are currently. |
Apologies for the slow response, will look into it tomorrow. |
This PR simplifies the
DaskScikitLearnBase.predict
method. The main change is to usemap_partitions
on the input rather than the current code which is hitting dask/distributed#8998.Opening this up early for feedback and to get it tested in CI. I've tested locally with dask==2024.9.1, 2024.12.1, and 2025.2.0., and everything in
tests/test_distributed/test_with_dask/test_with_dask.py
passes. Unfortunately, there are failures ondask@main
, but those are different from the current set of failures.Closes #10994