Skip to content

BenchmarkToolsExt #861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

BenchmarkToolsExt #861

wants to merge 3 commits into from

Conversation

penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Mar 25, 2025

This moves the (reusable bits of the) code from the DynamicPPL benchmarks into a BenchmarkTools extension, so that other people can use it.

It also adds docs and changelog entries. Docs preview at https://turinglang.org/DynamicPPL.jl/previews/PR861/api/#Benchmarking-Utilities

I thought tests might be a bit extreme, I don't see how to test this. The CI benchmarking job does effectively test that it works.

Closes TuringLang/TuringBenchmarking.jl#43

Copy link
Member Author

@penelopeysm penelopeysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, most of the code has just been shifted around, without any real modifications (the only new things are those needed to make it work, e.g. having the function itself be declared in src/DynamicPPL.jl so that it can be extended in the extension). Just a couple of clarifying comments.

Comment on lines +9 to +14
make_benchmark_suite(
[rng::Random.AbstractRNG,]
model::Model,
varinfo_choice::Symbol,
adtype::ADTypes.AbstractADType,
islinked::Bool
Copy link
Member Author

@penelopeysm penelopeysm Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function previously used to take a symbol as the adtype argument (e.g. :forwarddiff) and then use an internal function to convert it to an ADType. I think just passing the ADType itself is better because the ADTypes package is widely understood, more flexible, and doesn't require people to extend the symbol -> adtype function if they want to use a custom ADType, or figure out which symbol they need to use.

Comment on lines -15 to +41
data_1k = randn(rng, 1_000)
data_1k = randn(StableRNG(23), 1_000)
loop = Models.loop_univariate(length(data_1k)) | (; o=data_1k)
multi = Models.multivariate(length(data_1k)) | (; o=data_1k)
loop, multi
end
loop_univariate10k, multivariate10k = begin
data_10k = randn(rng, 10_000)
data_10k = randn(StableRNG(23), 10_000)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of changes to rng here. The point of using a fresh RNG object for every sampling call is to make sure that the values sampled don't change when we add or remove models.

We discussed this previously on Turing's test suite, cf. TuringLang/Turing.jl#2433 (comment)

Copy link
Contributor

github-actions bot commented Mar 25, 2025

Benchmark Report for Commit e9966ff

Computer Information

Julia Version 1.11.4
Commit 8561cc3d68d (2025-03-10 11:36 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | ForwardDiff |             typed |  false |                  9.2 |                 1.7 |
|           Smorgasbord |       201 | ForwardDiff |             typed |  false |                611.7 |                41.9 |
|           Smorgasbord |       201 | ForwardDiff | simple_namedtuple |   true |                434.6 |                44.8 |
|           Smorgasbord |       201 | ForwardDiff |           untyped |   true |               1251.9 |                26.6 |
|           Smorgasbord |       201 | ForwardDiff |       simple_dict |   true |               4095.4 |                19.1 |
|           Smorgasbord |       201 | ReverseDiff |             typed |   true |               1477.7 |                29.3 |
|           Smorgasbord |       201 |    Mooncake |             typed |   true |                948.9 |                 5.3 |
|    Loop univariate 1k |      1000 |    Mooncake |             typed |   true |               5554.6 |                 4.0 |
|       Multivariate 1k |      1000 |    Mooncake |             typed |   true |               1095.6 |                 8.4 |
|   Loop univariate 10k |     10000 |    Mooncake |             typed |   true |              61614.5 |                 3.7 |
|      Multivariate 10k |     10000 |    Mooncake |             typed |   true |               9058.8 |                 9.6 |
|               Dynamic |        10 |    Mooncake |             typed |   true |                136.3 |                12.2 |
|              Submodel |         1 |    Mooncake |             typed |   true |                 25.8 |                 7.2 |
|                   LDA |        12 | ReverseDiff |             typed |   true |                459.6 |                 4.9 |

Copy link

codecov bot commented Mar 25, 2025

Codecov Report

Attention: Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.

Project coverage is 84.30%. Comparing base (bb59885) to head (e9966ff).

Files with missing lines Patch % Lines
ext/DynamicPPLBenchmarkToolsExt.jl 0.00% 26 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           breaking     TuringLang/DynamicPPL.jl#861      +/-   ##
============================================
- Coverage     84.87%   84.30%   -0.58%     
============================================
  Files            34       35       +1     
  Lines          3815     3841      +26     
============================================
  Hits           3238     3238              
- Misses          577      603      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @penelopeysm. I looked at this and now feel the function make_benchmark_suite doesn't add much beyond what we have. In particular, if we make improvements to LogDensityFunction suggested in #863 and #862, then we no longer need the helper function make_benchmark_suite.

@penelopeysm
Copy link
Member Author

Yeah, that's a fair comment. I kind of wrote the same in the docs, it's easy enough to just benchmark logdensity_with_gradient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants