BenchmarkToolsExt #861

penelopeysm · 2025-03-25T19:31:01Z

This moves the (reusable bits of the) code from the DynamicPPL benchmarks into a BenchmarkTools extension, so that other people can use it.

It also adds docs and changelog entries. Docs preview at https://turinglang.org/DynamicPPL.jl/previews/PR861/api/#Benchmarking-Utilities

I thought tests might be a bit extreme, I don't see how to test this. The CI benchmarking job does effectively test that it works.

Closes TuringLang/TuringBenchmarking.jl#43

penelopeysm

Generally, most of the code has just been shifted around, without any real modifications (the only new things are those needed to make it work, e.g. having the function itself be declared in src/DynamicPPL.jl so that it can be extended in the extension). Just a couple of clarifying comments.

penelopeysm · 2025-03-25T19:33:08Z

ext/DynamicPPLBenchmarkToolsExt.jl

+    make_benchmark_suite(
+        [rng::Random.AbstractRNG,]
+        model::Model,
+        varinfo_choice::Symbol,
+        adtype::ADTypes.AbstractADType,
+        islinked::Bool


This function previously used to take a symbol as the adtype argument (e.g. :forwarddiff) and then use an internal function to convert it to an ADType. I think just passing the ADType itself is better because the ADTypes package is widely understood, more flexible, and doesn't require people to extend the symbol -> adtype function if they want to use a custom ADType, or figure out which symbol they need to use.

penelopeysm · 2025-03-25T19:36:05Z

benchmarks/benchmarks.jl

-    data_1k = randn(rng, 1_000)
+    data_1k = randn(StableRNG(23), 1_000)
    loop = Models.loop_univariate(length(data_1k)) | (; o=data_1k)
    multi = Models.multivariate(length(data_1k)) | (; o=data_1k)
    loop, multi
 end
 loop_univariate10k, multivariate10k = begin
-    data_10k = randn(rng, 10_000)
+    data_10k = randn(StableRNG(23), 10_000)


There are a bunch of changes to rng here. The point of using a fresh RNG object for every sampling call is to make sure that the values sampled don't change when we add or remove models.

We discussed this previously on Turing's test suite, cf. TuringLang/Turing.jl#2433 (comment)

github-actions · 2025-03-25T19:40:50Z

Benchmark Report for Commit `e9966ff`

Computer Information

Julia Version 1.11.4
Commit 8561cc3d68d (2025-03-10 11:36 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | ForwardDiff |             typed |  false |                  9.2 |                 1.7 |
|           Smorgasbord |       201 | ForwardDiff |             typed |  false |                611.7 |                41.9 |
|           Smorgasbord |       201 | ForwardDiff | simple_namedtuple |   true |                434.6 |                44.8 |
|           Smorgasbord |       201 | ForwardDiff |           untyped |   true |               1251.9 |                26.6 |
|           Smorgasbord |       201 | ForwardDiff |       simple_dict |   true |               4095.4 |                19.1 |
|           Smorgasbord |       201 | ReverseDiff |             typed |   true |               1477.7 |                29.3 |
|           Smorgasbord |       201 |    Mooncake |             typed |   true |                948.9 |                 5.3 |
|    Loop univariate 1k |      1000 |    Mooncake |             typed |   true |               5554.6 |                 4.0 |
|       Multivariate 1k |      1000 |    Mooncake |             typed |   true |               1095.6 |                 8.4 |
|   Loop univariate 10k |     10000 |    Mooncake |             typed |   true |              61614.5 |                 3.7 |
|      Multivariate 10k |     10000 |    Mooncake |             typed |   true |               9058.8 |                 9.6 |
|               Dynamic |        10 |    Mooncake |             typed |   true |                136.3 |                12.2 |
|              Submodel |         1 |    Mooncake |             typed |   true |                 25.8 |                 7.2 |
|                   LDA |        12 | ReverseDiff |             typed |   true |                459.6 |                 4.9 |

codecov · 2025-03-25T19:45:05Z

Codecov Report

Attention: Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.

Project coverage is 84.30%. Comparing base (bb59885) to head (e9966ff).

Files with missing lines	Patch %	Lines
ext/DynamicPPLBenchmarkToolsExt.jl	0.00%	26 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           breaking     TuringLang/DynamicPPL.jl#861      +/-   ##
============================================
- Coverage     84.87%   84.30%   -0.58%     
============================================
  Files            34       35       +1     
  Lines          3815     3841      +26     
============================================
  Hits           3238     3238              
- Misses          577      603      +26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yebai

Thanks, @penelopeysm. I looked at this and now feel the function make_benchmark_suite doesn't add much beyond what we have. In particular, if we make improvements to LogDensityFunction suggested in #863 and #862, then we no longer need the helper function make_benchmark_suite.

penelopeysm · 2025-03-26T11:24:50Z

Yeah, that's a fair comment. I kind of wrote the same in the docs, it's easy enough to just benchmark logdensity_with_gradient.

penelopeysm added 3 commits March 25, 2025 19:31

Move code to BenchmarkToolsExt

bbc82ef

Document BenchmarkToolsExt

c88e6f1

Update CI benchmark script

e9966ff

penelopeysm force-pushed the py/benchmarks branch from 827217b to e9966ff Compare March 25, 2025 19:31

penelopeysm commented Mar 25, 2025

View reviewed changes

penelopeysm mentioned this pull request Mar 25, 2025

Depreciate the TuringBenchmarking package in favour of LogDensityFunciton. TuringLang/TuringBenchmarking.jl#43

Open

yebai reviewed Mar 26, 2025

View reviewed changes

penelopeysm closed this Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BenchmarkToolsExt #861

BenchmarkToolsExt #861

penelopeysm commented Mar 25, 2025 •

edited

Loading

penelopeysm left a comment

penelopeysm Mar 25, 2025 •

edited

Loading

penelopeysm Mar 25, 2025

github-actions bot commented Mar 25, 2025 •

edited

Loading

codecov bot commented Mar 25, 2025 •

edited

Loading

yebai left a comment

penelopeysm commented Mar 26, 2025

BenchmarkToolsExt #861

BenchmarkToolsExt #861

Conversation

penelopeysm commented Mar 25, 2025 • edited Loading

penelopeysm left a comment

Choose a reason for hiding this comment

penelopeysm Mar 25, 2025 • edited Loading

Choose a reason for hiding this comment

penelopeysm Mar 25, 2025

Choose a reason for hiding this comment

github-actions bot commented Mar 25, 2025 • edited Loading

Benchmark Report for Commit e9966ff

Computer Information

Benchmark Results

codecov bot commented Mar 25, 2025 • edited Loading

Codecov Report

yebai left a comment

Choose a reason for hiding this comment

penelopeysm commented Mar 26, 2025

penelopeysm commented Mar 25, 2025 •

edited

Loading

penelopeysm Mar 25, 2025 •

edited

Loading

github-actions bot commented Mar 25, 2025 •

edited

Loading

Benchmark Report for Commit `e9966ff`

codecov bot commented Mar 25, 2025 •

edited

Loading