Skip to content

Factor out Substrait consumers into separate files #15794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gabotechs
Copy link
Contributor

@gabotechs gabotechs commented Apr 21, 2025

Which issue does this PR close?

Rationale for this change

The consumer.rs file grew a bit too big (~3400 LOC). Good thing is that it's easily splittable into separate files, each one responsible for converting one Substrait node into one DataFusion Logical plan node. With this change, people can just go to the file that they care about greatly reducing the amount of information that they need to deal with.

What changes are included in this PR?

A refactor of Substrait consumer.rs file into multiple files following these rules:

  • Every from_* prefixed function responsible for converting one Substrait node into one DataFusion Logical plan node is now factored out into its own file named after the original Substrait node (e.g. cast.rs, literal.rs, aggregate_rel.rs)
  • Every helper function that was only used once for a specific node conversion is moved to the same file that holds the node conversion function
  • Every helper function that is used by two or more node conversion functions is moved to a utils.rs file
  • The visibility rules of the functions to the outside is left intact (with a small exception, see below), making proper use of pub(super) for functions that now need to be shared across different files.
  • All the function names and function bodies are copy-pasted exactly as they were

There's one subtle public API change that might be nice to keep:

$ cargo public-api diff

Removed items from the public API
=================================
(none)

Changed items in the public API
===============================
(none)

Added items to the public API
=============================
+pub fn datafusion_substrait::logical_plan::consumer::from_substrait_type

That one should probably have been public from the beginning, and not exposing it now gets a bit messy as it's used outside of the consume module in the producer.rs tests, and hiding it to the outside would require introducing some #[cfg(test)] to the module definitions. I can hide though if people one a perfect no-api-change refactor.

Are these changes tested?

yes, by current pipelines.

Are there any user-facing changes?

The from_substrait_type function is now exposed

@github-actions github-actions bot added the substrait Changes to the substrait crate label Apr 21, 2025
@gabotechs gabotechs force-pushed the factor-out-substrait-consumer-into-files branch from 6d1ae75 to 4bfa3a8 Compare April 21, 2025 18:26
@gabotechs gabotechs force-pushed the factor-out-substrait-consumer-into-files branch from 4bfa3a8 to d974ee1 Compare April 22, 2025 05:45
@gabotechs gabotechs marked this pull request as ready for review April 22, 2025 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[substrait] refactor consumer.rs
1 participant