Skip to content

Better handling of treated input in RegressionDiscontinuityBetter handling treated input #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Karthikeya2026
Copy link

@Karthikeya2026 Karthikeya2026 commented Apr 8, 2025

This PR addresses issue #440 by improving the handling of the treated column in RegressionDiscontinuity.

Added a data validation step to ensure the treated column is of boolean type.

Included a test (test_treated_column_valid.py) to check that an exception is raised when the treated column contains integers instead of booleans.

This ensures clearer error messages and prevents mysterious errors when using integer-coded treatments.


📚 Documentation preview 📚: https://causalpy--450.org.readthedocs.build/en/450/

@Karthikeya2026
Copy link
Author

Hi @drbenvincent, I’ve opened this PR to address #440. Please review when you have a moment. Thanks!

@drbenvincent drbenvincent added the bug Something isn't working label Apr 16, 2025
@drbenvincent drbenvincent requested a review from Copilot April 18, 2025 16:48
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the input validation for the treated column in RegressionDiscontinuity to ensure it is of boolean type. It adds a validation function and corresponding tests to provide clearer error messages and prevent misuse of integer-coded treatments.

  • Added a data type check using pandas for the treated column.
  • Introduced tests to verify that a ValueError is raised when non-boolean data is provided.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
causalpy/experiments/test_treated_column_valid.py Added tests to validate that the treated column is boolean.
causalpy/experiments/regression_discontinuity.py Updated input validation to enforce that the treated column is of boolean type.
Comments suppressed due to low confidence (2)

causalpy/experiments/regression_discontinuity.py:193

  • Consider using a more robust check (e.g., using pd.api.types.is_bool_dtype) for the treated column instead of comparing dtypes to the string 'bool' to ensure consistency with the test validation function.
if not self.data['treated'].dtype == 'bool':

causalpy/experiments/regression_discontinuity.py:194

  • There is a minor typographical issue in the error message: add a space after 'bool.' to improve readability.
raise ValueError("The 'treated' column must be of type bool.Please convert your data accordingly.")

@drbenvincent drbenvincent self-requested a review April 18, 2025 16:48
Copy link

codecov bot commented Apr 18, 2025

Codecov Report

Attention: Patch coverage is 5.55556% with 17 lines in your changes missing coverage. Please review.

Project coverage is 93.60%. Comparing base (711bb92) to head (d9adb5c).
Report is 47 commits behind head on main.

Files with missing lines Patch % Lines
causalpy/experiments/test_treated_column_valid.py 0.00% 16 Missing ⚠️
causalpy/experiments/regression_discontinuity.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #450      +/-   ##
==========================================
- Coverage   94.40%   93.60%   -0.80%     
==========================================
  Files          31       32       +1     
  Lines        1985     2003      +18     
==========================================
+ Hits         1874     1875       +1     
- Misses        111      128      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the PR, I noticed that the new tests in test_treated_column_valid.py don’t directly test the input_validation method in the RegressionDiscontinuity class. Instead, they test a separate helper function that mimics some of the validation logic. While this is helpful, it would be more beneficial to test the actual input_validation method to ensure the validation works as expected within the context of the class.

Would you mind updating the tests to directly call input_validation? This way, we can ensure the validation logic is fully covered in the real implementation.

Let me know if you have any questions or need guidance on how to proceed. Thanks for your hard work—I’m looking forward to seeing the updated tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants