Swap out `patsy` for `formulae` #463

ksolarski · 2025-04-21T14:52:09Z

Solving issue #386

Starting with DiD, will continue with other methods if you with general design @drbenvincent

Seems like the key practical difference between formulae and patsy is lack of build_design_matrices method in formulae. User has to then provide formula again.

📚 Documentation preview 📚: https://causalpy--463.org.readthedocs.build/en/463/

drbenvincent · 2025-04-21T15:27:49Z

Cool. Thanks @ksolarski, just a quick reply from my phone...

Don't do this for the synthetic control because I have an in progress PR that will change it. It won't have a formula input.

But can I just get some clarification... does this change the API? Can we get the exact same functionality? If not, let's think again.

Will try to look at the code properly when I can 👍🏻

drbenvincent · 2025-04-21T15:41:21Z

I can't find where I saw it in the patsy docs at this point. But I think one of the things that build_design_matrices did was to ensure that predictions on new/out of sample data are correct. For example, you could get a situation where you don't have all levels of a categorical variable in one predictor for out of sample data. So I think if you to it naively, you can get silent errors.

I'm not 100% sure that this is a problem, and apologies I can't find the relevant part in the docs. But does my concern make sense?

ksolarski · 2025-04-22T07:38:37Z

You're right, Patsy has the power of preserving the transformation / encoding of variables through build_design_matrices method. There's no equivalent way in formulae so it's certainly not straightforward to copy paste the current behaviour with formulae.

However, Patsy repo suggests migration to https://github.com/matthewwardrop/formulaic instead, which is capable of "reusing the encoding choices made during conversion of one data-set on other datasets." (see https://matthewwardrop.github.io/formulaic/latest/). There's also a migration guide from Patsy to Formulaic to switch would be easy. It also supports many operators: https://matthewwardrop.github.io/formulaic/latest/guides/grammar/

Did you check out this library before? What do you think about using this instead of formulae?

DID switch

41a0236

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swap out `patsy` for `formulae` #463

Swap out `patsy` for `formulae` #463

ksolarski commented Apr 21, 2025 •

edited by github-actions bot

Loading

drbenvincent commented Apr 21, 2025

drbenvincent commented Apr 21, 2025

ksolarski commented Apr 22, 2025

Swap out patsy for formulae #463

Are you sure you want to change the base?

Swap out patsy for formulae #463

Conversation

ksolarski commented Apr 21, 2025 • edited by github-actions bot Loading

drbenvincent commented Apr 21, 2025

drbenvincent commented Apr 21, 2025

ksolarski commented Apr 22, 2025

Swap out `patsy` for `formulae` #463

Swap out `patsy` for `formulae` #463

ksolarski commented Apr 21, 2025 •

edited by github-actions bot

Loading