* Add possibility to ask for libbenchmark version number (#1004)
Add a header which holds the current major, minor, and
patch number of the library. The header is auto generated
by CMake.
* Do not generate unused functions (#1004)
* Add support for version number in bazel (#1004)
* Fix clang format #1004
* Fix more clang format problems (#1004)
* Use git version feature of cmake to determine current lib version
* Rename version_config header to version
* Bake git version into bazel build
* Use same input config header as in cmake for version.h
* Adapt the releasing.md to include versioning in bazel
Report all time numbers > 10 digits in scientific notation with
4 decimal places. This is necessary since only 10 digits
are currently reserved for the time columns (Time and CPU).
If exceeding 10 digits the output isnt properly aligned anymore.
* Introduce warmup phase to BenchmarkRunner (#1130)
In order to account for caching effects in user
benchmarks introduce a new command line option
"--benchmark_min_warmup_time"
which allows to specify an amount of time for
which the benchmark should be run before results
are meaningful.
* Adapt review suggestions regarding introduction of warmup phase (#1130)
* Fix BM_CHECK call in MinWarmUpTime (#1130)
* Fix comment on requirements of MinWarmUpTime (#1130)
* Add basic description of warmup phase mechanism to user guide (#1130)
* Add option to get the verbosity provided by commandline flag -v (#1330)
* replace assert with test failure
asserts are stripped out in non debug builds, and we run tests in non-debug CI bots.
* clang-format my own tweak
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
* Filter out benchmarks that start with "DISABLED_"
This could be slightly more elegant, in that the registration and the
benchmark definition names have to change. Ideally, we'd still register
without the DISABLED_ prefix and it would all "just work".
Fixes#1365
* add some documentation
* Add SetBenchmarkFilter() to set --benchmark_filter flag value in user code.
Use case: Provide an API to set this flag indepedence of the flag's implementation (ie., absl flag vs benchmark's flag facility)
* add test
* added notes on Initialize()
* Add option to set the default time unit globally
This commit introduces the `--benchmark_time_unit={ns|us|ms|s}` command line argument. The argument only affects benchmarks where the time unit is not set explicitly.
* Update AUTHORS and CONTRIBUTORS
* Test `SetDefaultTimeUnit`
* clang format
* Use `GetDefaultTimeUnit()` for initializing `TimeUnit` variables
* Review fixes
* Export functions
* Add comment
* introduce the possibility to customize the help printer function
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
* fixed naming convertion, and introduce the option function in the init method
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
* remove the macros to inject the helper function
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
* remove the default implementation, and introduce the nullprt
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
* Expose default display reporter creation in public API
this is useful when a custom reporter wants to fall back on the default
display reporter, but doesn't necessarily have access to the benchmark
library flag configuration.
* Make use of unique_ptr in the random interleaving test.
* clang-format
This patch fixes#1306, by reducing the pinned instances of
PerfCounters.
The issue is caused by creating multiple pinned events in the
same thread, doing so results in the Snapshot(PerfCounterValues* values)
failing, and that's now discoverable.
Creating multile pinned events is an unsupported behavior currently.
The error would be detected at read() time, not
perf_event_open() / iotcl() time.
The unsupported benavior above is confirmed by Stephane Eranian @seranian,
and he also pointed the dectection method.
Finished this patch under the guidance of Mircea Trofin @mtrofin.
* Address MSVC C4722 warning in tests
Some test paths deliberately exit, and it appears that the appropriate declspec
does not stop the compiler generating the C4722 warning as one might expect.
Per https://github.com/google/benchmark/issues/826#issuecomment-851995549
this commit ignores the warning for the affected call site.
* Fix up Formatting
* Fix up formatting issue on pragmas
* Fix up formatting issue on pragmas take 2
Co-authored-by: Staffan Tjernstrom <staffantj@users.noreply.github.com>
This applies a fix that used to exist in LLVM's downstream copy of
this library, from
948ce4e6ed.
I presume this warning isn't present if built with MSVC or Clang-cl,
but it's printed in MinGW mode. As the benchmark library adds
-Werror, this is a fatal error when builtin MinGW mode.
* Add Setup/Teardown option on Benchmark.
Motivations:
- feature parity with our internal library. (which has ~718 callers)
- more flexible than cordinating setup/teardown inside the benchmark routine.
* change Setup/Teardown callback type to raw function pointers
* add test file to cmake file
* move b.Teardown() up
* add const to param of Setup/Teardown callbacks
* fix comment and add doc to user_guide
* fix typo
* fix doc, fix test and add bindings to python/benchmark.cc
* fix binding again
* remove explicit C cast - that was wrong
* change policy to reference_internal
* try removing the bindinds ...
* clean up
* add more tests with repetitions and fixtures
* more comments
* init setup/teardown callbacks to NULL
* s/nullptr/NULL
* removed unused var
* change assertion on fixture_interaction::fixture_setup
* move NULL init to .cc file
* Fix un-initted error in test.
Found by -Werror,-Wsometimes-uninitialized
* Update spec_arg_test.cc
* additional change:
- Change the API on GetBenchmarkFilter and the `spec` to std::string because google C++ styleguide internally kind of discouraged using raw const char*
* [RFC] Adding API for setting/getting benchmark_filter flag?
This PR is more of a Request-for-comment - open to other ideas/suggestions as well.
Details:
This flag has different implementations(absl vs benchmark) and since the proposal to add absl as a dependency was rejected, it would be nice to have a reliable (and less hacky) way to access this flag internally.
(Actually, reading it isn't much a problem but setting it is).
Internally, we have a sizeable number users to use absl::SetFlags to set this flag. This will not work with benchmark-flags.
Another motivation is that not all users use the command line flag. Some prefer to programmatically set this value.
* fixed build errors
* fix lints again
* per discussion: add additional RunSpecifiedBenchmarks instead.
* add tests
* fix up tests
* clarify comment
* fix stray : in test
* more assertion in test
* add test file to test/CMakeLists.txt
* more test
* make test ISO C++ compliant
* fix up BUILD file to pass the flag
In #1238, one of MemoryManager's Stop methods was marked as deprecated
and this method is used in the same header. This change generated
-Wdeprecated-declarations warning on every file that includes
"benchmark.h". Use gcc's diagnostics to fix this warning.
WebRTC uses Google Benchmarks as a dependency and uses Chromium's build
infrastructure. Chromium is compiled using clang-cl on Windows, and the
-Wdeprecated-declarations warning is triggered. Because clang-cl accepts
gcc's diagnostic prama and defines the __clang__ macro,
using it can solve this issue.
Bug: webrtc:13280
* Introduce Coefficient of variation aggregate
I believe, it is much more useful / use to understand,
because it is already normalized by the mean,
so it is not affected by the duration of the benchmark,
unlike the standard deviation.
Example of real-world output:
```
raw.pixls.us-unique/GoPro/HERO6 Black$ ~/rawspeed/build-old/src/utilities/rsbench/rsbench GOPR9172.GPR --benchmark_repetitions=27 --benchmark_display_aggregates_only=true --benchmark_counters_tabular=true
2021-09-03T18:05:56+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3596.16 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
Load Average: 7.00, 2.99, 1.85
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:32/process_time/real_time_mean 11.1 ms 353 ms 27 0.353122 31.9473 12M 33.9879M 1085.84M 2.83232 90.4864 0.0110535
GOPR9172.GPR/threads:32/process_time/real_time_median 11.0 ms 352 ms 27 0.351696 31.9599 12M 34.1203M 1090.11M 2.84336 90.8425 0.0110081
GOPR9172.GPR/threads:32/process_time/real_time_stddev 0.159 ms 4.60 ms 27 4.59539m 0.0462064 0 426.371k 14.9631M 0.0355309 1.24692 158.944u
GOPR9172.GPR/threads:32/process_time/real_time_cv 1.44 % 1.30 % 27 0.0130136 1.44633m 0 0.0125448 0.0137802 0.0125448 0.0137802 0.0143795
```
Fixes https://github.com/google/benchmark/issues/1146
* Be consistent, it's CV, not 'rel std dev'
* Statistics: add support for percentage unit in addition to time
I think, `stddev` statistic is useful, but confusing.
What does it mean if `stddev` of `1ms` is reported?
Is that good or bad? If the `median` is `1s`,
then that means that the measurements are pretty noise-less.
And what about `stddev` of `100ms` is reported?
If the `median` is `1s` - awful, if the `median` is `10s` - good.
And hurray, there is just the statistic that we need:
https://en.wikipedia.org/wiki/Coefficient_of_variation
But, naturally, that produces a value in percents,
but the statistics are currently hardcoded to produce time.
So this refactors thinkgs a bit, and allows a percentage unit for statistics.
I'm not sure whether or not `benchmark` would be okay
with adding this `RSD` statistic by default,
but regales, that is a separate patch.
Refs. https://github.com/google/benchmark/issues/1146
* Address review notes
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate the direct access to these fields.
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate the direct access to these fields.
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
Both `.` and `<empty>` already means "run all benchmarks" here, as commented on this flag's declaration (and below around line 448-449).
So this is a NFC.
On the other hand, this help internally because internally, if the flag is empty (or if it's not a specified by a binary), we don't call the RunSpecifiedBenchmarks.
There is still a difference in what <empty> means internally (runs no benchmarks) and externally (runs all benchmarks).
But we can work around this.
* Remove `min` with dead path.
When `isSignificant` is false, the smallest value `multiplier` can
approach is 14. Thus, min(10, multiplier) will always return 10.
Addresses part one of #1205.
* Remove always false condition.
1. `multiplier <= 1.0` implies `i.seconds >= min_time * 1.4`
2. By (1), `isSignficant` is true because `i.seconds > min_time * 1.4` implies `i.seconds > min_time` implies that `i.seconds / minTime > 1 > 0.1`. Thus, the ternary maintains the same multiplier value.
3. `ShouldReportResults` is always called before `PredictNumItersNeeded`, if `i.seconds >= min_time` then the loop is broken and `PredictNumItersNeeded` is never called.
4. 1 and 3 together imply that `multiplier <= 1.0` is never true.
Addresses part 2 of #1205.
This can be used together with ArgsProduct() to allow multiple ranges
with different multipliers and mixing dense and sparse ranges.
Example:
BENCHMARK(MyTest)->ArgsProduct({
CreateRange(0, 1024, /*multi=*/32),
CreateRange(0, 100, /*multi=*/4),
CreateDenseRange(0, 4, /*step=*/1)
});
Co-authored-by: Jen-yee Hong <pcmantw@google.com>
Inspired by the original implementation by Hai Huang @haih-g
from https://github.com/google/benchmark/pull/1105.
The original implementation had design deficiencies that
weren't really addressable without redesign, so it was reverted.
In essence, the original implementation consisted of two separateable parts:
* reducing the amount time each repetition is run for, and symmetrically increasing repetition count
* running the repetitions in random order
While it worked fine for the usual case, it broke down when user would specify repetitions
(it would completely ignore that request), or specified per-repetition min time (while it would
still adjust the repetition count, it would not adjust the per-repetition time,
leading to much greater run times)
Here, like i was originally suggesting in the original review, i'm separating the features,
and only dealing with a single one - running repetitions in random order.
Now that the runs/repetitions are no longer in-order, the tooling may wish to sort the output,
and indeed `compare.py` has been updated to do that: #1168.
Currently the lifetime of a single BenchmarkRunner is constrained
to a RunBenchmark(), but that will have to change for interleaved
benchmark execution, because we'll need to keep it around to not
forget how much repetitions of an instance we've done.
While the current variant works, it assumes that all the instances of
a single family will be run together, with nothing inbetween them.
Naturally, that won't work once the runs may be interleaved.
Much like it makes sense to enumerate all the families,
it makes sense to enumerate stuff within families.
Alternatively, we could have a global instance index,
but i'm not sure why that would be better.
This will be useful when the benchmarks are run not in order,
for the tools to sort the results properly.
It may be useful for those wishing to further post-process JSON results,
but it is mainly geared towards better support for run interleaving,
where results from the same family may not be close-by in the JSON.
While we won't be able to do much about that for outputs,
the tools can and perhaps should reorder the results to that
at least in their output they are in proper order, not run order.
Note that this only counts the families that were filtered-in,
so if e.g. there were three families, and we filtered-out
the second one, the two families (which were first and third)
will have family indexes 0 and 1.
* Implementation of random interleaving. See
http://github.com/google/benchmark/issues/1051 for the feature requests.
Committer: Hai Huang (http://github.com/haih-g)
On branch fr-1051
Changes to be committed:
modified: include/benchmark/benchmark.h
modified: src/benchmark.cc
new file: src/benchmark_adjust_repetitions.cc
new file: src/benchmark_adjust_repetitions.h
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_register.cc
modified: src/benchmark_runner.cc
modified: src/benchmark_runner.h
modified: test/CMakeLists.txt
new file: test/benchmark_random_interleaving_gtest.cc
* Fix benchmark_random_interleaving_gtest.cc for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_runner.cc
modified: test/benchmark_random_interleaving_gtest.cc
* Fix macos build for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_runner.cc
* Fix macos and windows build for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_runner.cc
* Fix benchmark_random_interleaving_test.cc for macos and windows in fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: test/benchmark_random_interleaving_gtest.cc
* Fix int type benchmark_random_interleaving_gtest for macos in fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: test/benchmark_random_interleaving_gtest.cc
* Address dominichamon's comments 03/29 for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: test/benchmark_random_interleaving_gtest.cc
* Address dominichamon's comment on default min_time / repetitions for fr-1051.
Also change sentinel of random_interleaving_repetitions to -1. Hopefully it
fixes the failures on Windows.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
* Fix windows test failures for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_runner.cc
* Add license blurb for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_adjust_repetitions.cc
modified: src/benchmark_adjust_repetitions.h
* Switch to std::shuffle() for fr-1105.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* Change to 1e-9 in fr-1105
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_adjust_repetitions.cc
* Fix broken build caused by bad merge for fr-1105.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_runner.cc
* Fix build breakage for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_register.cc
modified: src/benchmark_runner.cc
* Print out reports as they come in if random interleaving is disabled (fr-1051)
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* size_t, int64_t --> int in benchmark_runner for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_runner.cc
modified: src/benchmark_runner.h
* Address comments from dominichamon for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_adjust_repetitions.cc
modified: src/benchmark_adjust_repetitions.h
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: test/benchmark_random_interleaving_gtest.cc
* benchmar_indices --> size_t to make CI pass: fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* Fix min_time not initialized issue for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
* min_time --> MinTime in fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_runner.cc
* Add doc for random interleaving for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: README.md
new file: docs/random_interleaving.md
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
This also fixes#1135. Because StrSplit was returning a vector with an
empty string, it was treated by PerfCounters::Create as a legitimate ask
for setting up a counter with that name. The empty vector is understood
by PerfCounters as "just return NoCounters()".
* Add API to benchmark allowing for custom context to be added
Fixes#525
* add docs
* Add context flag output to JSON reporter
* Plumb everything into the global context.
* Add googletests for custom context
* update docs with duplicate key behaviour
* Add `benchmark_context` flag that allows per-run custom context.
Add support for key-value flags in general.
Added test for key-value flags.
Added `benchmark_context` flag.
Output content of `benchmark_context` to base reporter.
Solves the first part of #525.
* Docs and better help
* Support optional, user-directed collection of performance counters
The patch allows an engineer wishing to drill into the root causes
of a regression, for example. Currently, only single threaded runs
are supported. The feature is a build-time opt in, and then a runtime
opt in.
The engineer may run the benchmark executable, passing a list of
performance counter names (using libpfm's naming scheme) at the
command line. The counter values will then be collected and reported
back as UserCounters.
This is different from #240 in that it is a benchmark user opt-in, and
the counter collection is transparent to the benchmark.
Currently, this is only supported on platforms where libpfm is
supported.
libpfm: http://perfmon2.sourceforge.net/
* 'Use' values param in Snapshot when BENCHMARK_OS_WINDOWS
This is to avoid unused parameter warning-as-error
* Added missing include for <vector> in perf_counters.cc
* Moved doc to docs
* Added license blurbs
Currently, i get:
```
Run on (32 X 7326.56 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
```
which seems mostly right, except that the frequency is rather bogus.
Yes, i guess the CPU could theoretically achieve that,
but i have 3.6GHz configured, and scaling disabled.
So we clearly read the wrong thing.
With this fix, i now get the expected
```
Run on (32 X 3598.53 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
```
Use the benchmark's reported iteration count when estimating
iterations for the next repetition, rather than the requested
iteration count. When the benchmark uses KeepRunningBatch the actual
iteration count can be larger than the one the runner requested.
Prior to this fix the runner was underestimating the next iteration
count, sometimes significantly so. Consider the case of a benchmark
using a batch size of 1024. Prior to this change, the benchmark
runner would attempt iteration counts 1, 10, 100 and 1000, yet the
benchmark itself would do the same amount of work each time: a single
batch of 1024 iterations. The discrepancy could also contribute to
estimation errors once the benchmark time reached 10% of the target.
For example, if the very first batch of 1024 iterations reached 10% of
benchmark_min_min time, the runner would attempt to scale that to 100%
from a basis of one iteration rather than 1024.
This bug was particularly noticeable in benchmarks with large batch
sizes, especially when the benchmark also had slow set up or tear down
phases.
With this fix in place it is possible to use KeepRunningBatch to
achieve a kind of "minimum iteration count" feature by using a larger
fixed batch size. For example, a benchmark may build a map of 500K
elements and test a "find" operation. There is no point in running
"find" just 1, 10, 100, etc., times. The benchmark can now pick a
batch size of something like 10K, and the runner will arrive at the
final max iteration count with in noticeably fewer repetitions.
When building with gcc TSan on, and in Debug mode, we see a warning
like:
benchmark/src/timers.cc: In function ‘std::string benchmark::LocalDateTimeString()’:
src/timers.cc:241:15: warning: ‘char* strncat(char*, const char*, size_t)’ output may be truncated copying 108 bytes from a string of length 127 [-Wstringop-truncation]
241 | std::strncat(storage, tz_offset, sizeof(storage) - timestamp_len - 1);
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While this is essentially a false positive (we never expect
the number of bytes in tz_offset to be too large), the compiler can't
actually tell that. Shrink the size of tz_offset to a smaller, but still safe
size to eliminate this warning.
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
* Implement custom benchmark name
The benchmark's name can be changed using the Name() function
which internally uses SetName().
* Update AUTHORS and CONTRIBUTORS
* Describe new feature in README
* Move new name function up
Fixes#1106
The existing behavior results in the `0` value being added twice. Since
`lo` is always added to `dst`, we never want to explicitly add `0` if
`lo` is equal to `0`.
On s390 architecture, z/OS XL compiler uses HLASM inline assembly, which has different syntax and needs to be distinguished to avoid compilation error.
Noticed missing header when was building llvm with gcc-11:
```
llvm-project/llvm/utils/benchmark/src/benchmark_register.h:17:30:
error: 'numeric_limits' is not a member of 'std'
17 | static const T kmax = std::numeric_limits<T>::max();
| ^~~~~~~~~~~~~~
```
Without this commit, compilation fails on DragonFly with the following message:
```
/home/mneumann/Dev/benchmark.old/src/sysinfo.cc:446:2: error: #warning "HOST_NAME_MAX not defined. using 64" [-Werror=cpp]
^~~~~~~
```
Also note that the sysctl is actually `hw.tsc_frequency` on DragonFly:
```
$ sysctl hw.tsc_frequency
hw.tsc_frequency: 3498984022
```
Tested on:
```
$ uname -a
DragonFly box.localnet 5.9-DEVELOPMENT DragonFly v5.9.0.742.g4b29dd-DEVELOPMENT #5: Tue Aug 18 00:21:31 CEST 2020
```
As per discussions in here [1], LLVM is going to get backend support on
Motorola 68000 series CPUs (a.k.a M68K or M680x0). So it's necessary to
add CycleTimer implementation here, which is simply using `gettimeofday`
same as MIPS. This fixes#1049
[1] https://reviews.llvm.org/D88389
* Add CartesianProduct with associated test
* Use CartesianProduct in Ranges to avoid code duplication
* Add new cartesian_product_test to CMakeLists.txt
* Update AUTHORS & CONTRIBUTORS
* Rename CartesianProduct to ArgsProduct
* Rename test & fixture accordingly
* Add example for ArgsProduct to README
As noted in #995, this causes issues when the command line flag already
starts with "benchmark_", which they all do.
Not caught by tests as the test flags didn't start with "benchmark".
Fixes#995
* JSONReporter: don't report on scaling if we didn't get it (#1005)
* JSONReporter: fix due to review (std::pair<bool, bool> -> enum)
* JSONReporter: scaling: fix the algo (due to review discussion)
* benchmark.h: revert to old-fashioned enum's (C++03 compatibility); rreporter_output_test: let's skip scaling
* timestamp: use rfc3339-formatted timestamps in output
Replace localized timestamps with machine-readable IETF RFC 3339 format
timestamps. This is an attempt to make the output timestamps easily
machine-readable. ISO8601 specifies standards for time interchange
formats. IETF RFC 3339: https://tools.ietf.org/html/rfc3339 defines a
subset of these for use in the internet. The general form for these
timestamps is:
YYYY-MM-DDTHH:mm:SS[+-]hhmm
This replaces the localized time formats that are currently being used
in the benchmark output to prioritize interchangeability and
machine-readability.
This might break existing programs that rely on the particular date-time
format. This might also may make times less human readable. RFC3339 was
intended to balance human readability and simplicity for machine
readability, but it is primarily intended as an internal representation.
* timers: remove utc string formatting
We only ever need local time printing. Remove the UTC printing
and cosnolidate the logic slightly.
* timers: manually create rfc3339 string
The C++ standard library does not output the time offset in RFC3339
format, it is missing the : between hours and minutes. VS does not
appear to support timezone information by default. To avoid adding too
much complexity to benchmark around timezone handling e.g. a full
date library like https://github.com/HowardHinnant/date, we fall back
to outputting GMT time with a -00:00 offset for those cases.
* timers: use reentrant form for localtime_r & tmtime_r
For non-windows, use the reentrant form for the time conversion
functions.
* timers: cleanup
Use strtol instead of brittle moving characters around.
* timers: only call strftime twice.
Also size buffers to known maximum necessary size and name constants
more appropriately.
* timers: fix unused variable warning
* Add missing <cerrno> header
This commit fixes a current build error on Android where 'errno' is an unidentified
symbol due to a missing header
* Update string_util.cc
Conditionally adds <cerrno> if BENCHMARK_STL_ANDROID_GNUSTL is defined
In a previous commit[1], diagnostic pragmas were used to avoid this
warning. However, the incorrect warning flag was indicated, leaving the
warning in place. -Wdeprecated is for deprecated features while
-Wdeprecated-declarations for deprecated functions, variables, and
types[2].
[1] c408461983
[2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Fixes the following issues with the implementation of `cycleclock::Now`:
- The RISC-V implementation wouldn't compile due to a typo;
- Both the PPC and RISC-V implementation's asm statements lacked the
volatile keyword. This resulted in the repeated read of the counter's
high part being optimized away, so overflow wasn't handled at all.
Multiple counter reads could also be misoptimized, especially in LTO
scenarios.
- Relied on the zero/sign-extension of inline asm operands, which isn't
guaranteed to occur and differs between compilers, namely GCC and Clang.
The PowerPC64 implementation was improved to do a single 64-bit read of
the time-base counter.
The RISC-V implementation was improved to do the overflow handing in
assembly, since Clang would generate a branch, defeating the purpose
of the non-branching counter reading approach.
* Fix type conversion warnings.
Fixes#949
Tested locally (Linux/clang), but warnings are on MSVC so may differ.
* Drop the ULP so the double test passes
* Add State::error_occurred()
* Relax CHECK condition in benchmark_runner.cc
If the benchmark state contains an error, do not expect any iterations has been run.
This allows using SkipWithError() and return early from the benchmark function.
* README.md: document new possible usage of SkipWithError()
This fixes the Visual Studio 2019 warning:
`C4244: '=': conversion from 'int' to 'char', possible loss of data`
When implicitly casting the return value of tolower() (int) to char.
Fixes: #932
* add Jordan Williams to both CONTRIBUTORS and AUTHORS
* alias benchmark libraries
Provide aliased CMake targets for the benchmark and benchmark_main targets.
The alias targets are namespaced under benchmark::, which is the namespace when they are exported.
I chose not to use either the PROJECT_NAME or the namespace variable but to hard-code the namespace.
This is because the benchmark and benchmark_main targets are hard-coded by name themselves.
Hard-coding the namespace is also much cleaner and easier to read.
* link to aliased benchmark targets
It is safer to link against namespaced targets because of how CMake interprets the double colon.
Typo's will be caught by CMake at configuration-time instead of during compile / link time.
* document the provided alias targets
* add "Usage with CMake" section in documentation
This section covers linking against the alias/import CMake targets and including them using either find_package or add_subdirectory.
* format the "Usage with CMake" README section
Added a newline after the "Usage with CMake" section header.
Dropped the header level of the section by one to make it a direct subsection of the "Usage" section.
Wrapped lines to be no longer than 80 characters in length.
* Add DEBUG_POSTFIX to libraries.
Makes it possible to install Debug and Release versions on the
same system. Without this, there were only linker errors when using
the wrong configuration.
* Update CONTRIBUTORS and AUTHORS according to guide
Initialize option flags from environment variables values if they are defined, eg. `BENCHMARK_OUT=<filename>` for `--benchmark_out=<filename>`. Command line flag value always prevails.
Fixes https://github.com/google/benchmark/issues/881.
These OS's don't always have HOST_NAME_MAX defined, resulting in
build errors.
A few related changes as well:
* Only define HOST_NAME_MAX if it's not already defined. There are
some cases where this is already defined, e.g. with NaCl if
__USE_POSIX is set. To avoid all of these, only define it if it's
not already defined.
* Default HOST_NAME_MAX to 64 and issue a #warning. Having the wrong
max length is pretty harmless. The name just ends up getting
truncated and this is only for printing debug info. Because we're
constructing a std::string from a char[] (so defined length), we
don't need to worry about gethostname's undefined behavior for
whether the truncation is null-terminated when the hostname
doesn't fit in HOST_NAME_MAX. Of course, this doesn't help people
who have -Werror set, since they'll still get a warning.
- Adresses : #856
- The unused `doc` argument was removed from the `DEFINE_` macros in
`commandlineflags.h`
- Converted all the previous `doc` strings passed to the `DEFINE_`
macros to multiline comments.
While current counters can e.g. answer the question
"how many items is processed per second", it is impossible to get
it to tell "how many seconds it takes to process a single item".
The solution is to add a yet another modifier `kInvert`,
that is *always* considered last, which simply inverts the answer.
Fixes#781, #830, #848.
The CSVReporter is deprecated, but we still need to reference it in
a few places. To avoid breaking the build when warnings are errors,
we need to disable the warning when we do so.
The filenames are consistently inconsistent in windows world, might
have something to do with default file system being case-insensitive.
While the native MinGW buils were fixed in 5261307982
that only addressed the headers, but not libraries.
The problem remains when one tries to do a MinGW cross-build from
case-sensitive filesystem.
The RISC-V implementation of `cycleclock::Now` uses the user-space
`rdcycle` instruction to query how many cycles have happened since the
core started.
The only complexity here is on 32-bit RISC-V, where `rdcycle` can only
read the lower 32 bits of the 64-bit hardware counter. In this case,
`rdcycleh` reads the higher 32 bits of the counter. We match the powerpc
implementation to detect and correct for overflow in the high bits.
This is a shameless rip-off of https://github.com/google/benchmark/pull/646
I did promise to look into why that proposed PR was producing
so much worse assembly, and so i finally did.
The reason is - that diff changes `size_t` (unsigned) to `int64_t` (signed).
There is this nice little `assert`:
7a1c370283/include/benchmark/benchmark.h (L744)
It ensures that we didn't magically decide to advance our iterator
when we should have finished benchmarking.
When `cached_` was unsigned, the `assert` was `cached_ UGT 0`.
But we only ever get to that `assert` if `cached_ NE 0`,
and naturally if `cached_` is not `0`, then it is bigger than `0`,
so the `assert` is tautological, and gets folded away.
But now that `cached_` became signed, the assert became `cached_ SGT 0`.
And we still only know that `cached_ NE 0`, so the assert can't be
optimized out, or at least it doesn't currently.
Regardless of whether or not that is a bug in itself,
that particular diff would have regressed the normal 64-bit systems,
by halving the maximal iteration space (since we go from unsigned counter
to signed one, of the same bit-width), which seems like a bug.
And just so it happens, fixing *this* bug, fixes the other bug.
This produces fully (bit-by-bit) identical state_assembly_test.s
The filecheck change is actually needed regardless of this patch,
else this test does not pass for me even without this diff.
* Add support for GNU Install Dirs from GNU Coding Standards
* src/CMakeLists.txt: Added support for setting the standard variables,
such as CMAKE_INSTALL_BINDIR.
* Replace install destinations by the ones from GNU Coding Standards.
* Set the default .cmake and .pc default path.
* escape special chars in csv and json output.
- escape \b,\f,\n,\r,\t,\," from strings before dumping
them to json or csv.
- also faithfully reproduce the sign of nan in json.
this fixes github issue #745.
* functionalize.
* split string escape functions between csv and json
* Update src/csv_reporter.cc
Co-Authored-By: tesch1 <tesch1@gmail.com>
* Update src/json_reporter.cc
Co-Authored-By: tesch1 <tesch1@gmail.com>
* Add FIXME in multiple_ranges_test.cc
* Improve handling of large bounds in AddRange.
Due to breaking the loop too early, AddRange
would miss a final multplier of 'mult' that
was within the numeric range of T.
* Enable negative values for Range argument
Fixes#762.
* Try to fix build of benchmark_gtest
* Try some more to fix build
* Attempt to fix format macros
* Attempt to resolve format errors for mingw32
* Review feedback
Put unit tests in benchmark::internal namespace
Fix error reporting in multiple_ranges_test.cc
* [JSON] add threads and repetitions to the json output, for better ide…
[Tests] explicitly check for thread == 1
[Tests] specifically mark all repetition checks
[JSON] add repetition_index reporting, but only for non-aggregates (i…
* [Formatting] Be very, very explicit about pointer alignment so clang-format can not put pointers/references on the wrong side of arguments.
[Benchmark::Run] Make sure to use explanatory sentinel variable rather than a magic number.
* Do not pass redundant information