This patch fixes#1306, by reducing the pinned instances of
PerfCounters.
The issue is caused by creating multiple pinned events in the
same thread, doing so results in the Snapshot(PerfCounterValues* values)
failing, and that's now discoverable.
Creating multile pinned events is an unsupported behavior currently.
The error would be detected at read() time, not
perf_event_open() / iotcl() time.
The unsupported benavior above is confirmed by Stephane Eranian @seranian,
and he also pointed the dectection method.
Finished this patch under the guidance of Mircea Trofin @mtrofin.
* Fix un-initted error in test.
Found by -Werror,-Wsometimes-uninitialized
* Update spec_arg_test.cc
* additional change:
- Change the API on GetBenchmarkFilter and the `spec` to std::string because google C++ styleguide internally kind of discouraged using raw const char*
* [RFC] Adding API for setting/getting benchmark_filter flag?
This PR is more of a Request-for-comment - open to other ideas/suggestions as well.
Details:
This flag has different implementations(absl vs benchmark) and since the proposal to add absl as a dependency was rejected, it would be nice to have a reliable (and less hacky) way to access this flag internally.
(Actually, reading it isn't much a problem but setting it is).
Internally, we have a sizeable number users to use absl::SetFlags to set this flag. This will not work with benchmark-flags.
Another motivation is that not all users use the command line flag. Some prefer to programmatically set this value.
* fixed build errors
* fix lints again
* per discussion: add additional RunSpecifiedBenchmarks instead.
* add tests
* fix up tests
* clarify comment
* fix stray : in test
* more assertion in test
* add test file to test/CMakeLists.txt
* more test
* make test ISO C++ compliant
* fix up BUILD file to pass the flag
In #1238, one of MemoryManager's Stop methods was marked as deprecated
and this method is used in the same header. This change generated
-Wdeprecated-declarations warning on every file that includes
"benchmark.h". Use gcc's diagnostics to fix this warning.
WebRTC uses Google Benchmarks as a dependency and uses Chromium's build
infrastructure. Chromium is compiled using clang-cl on Windows, and the
-Wdeprecated-declarations warning is triggered. Because clang-cl accepts
gcc's diagnostic prama and defines the __clang__ macro,
using it can solve this issue.
Bug: webrtc:13280
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate the direct access to these fields.
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate the direct access to these fields.
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
Both `.` and `<empty>` already means "run all benchmarks" here, as commented on this flag's declaration (and below around line 448-449).
So this is a NFC.
On the other hand, this help internally because internally, if the flag is empty (or if it's not a specified by a binary), we don't call the RunSpecifiedBenchmarks.
There is still a difference in what <empty> means internally (runs no benchmarks) and externally (runs all benchmarks).
But we can work around this.
Inspired by the original implementation by Hai Huang @haih-g
from https://github.com/google/benchmark/pull/1105.
The original implementation had design deficiencies that
weren't really addressable without redesign, so it was reverted.
In essence, the original implementation consisted of two separateable parts:
* reducing the amount time each repetition is run for, and symmetrically increasing repetition count
* running the repetitions in random order
While it worked fine for the usual case, it broke down when user would specify repetitions
(it would completely ignore that request), or specified per-repetition min time (while it would
still adjust the repetition count, it would not adjust the per-repetition time,
leading to much greater run times)
Here, like i was originally suggesting in the original review, i'm separating the features,
and only dealing with a single one - running repetitions in random order.
Now that the runs/repetitions are no longer in-order, the tooling may wish to sort the output,
and indeed `compare.py` has been updated to do that: #1168.
While the current variant works, it assumes that all the instances of
a single family will be run together, with nothing inbetween them.
Naturally, that won't work once the runs may be interleaved.
* Implementation of random interleaving. See
http://github.com/google/benchmark/issues/1051 for the feature requests.
Committer: Hai Huang (http://github.com/haih-g)
On branch fr-1051
Changes to be committed:
modified: include/benchmark/benchmark.h
modified: src/benchmark.cc
new file: src/benchmark_adjust_repetitions.cc
new file: src/benchmark_adjust_repetitions.h
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_register.cc
modified: src/benchmark_runner.cc
modified: src/benchmark_runner.h
modified: test/CMakeLists.txt
new file: test/benchmark_random_interleaving_gtest.cc
* Fix benchmark_random_interleaving_gtest.cc for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_runner.cc
modified: test/benchmark_random_interleaving_gtest.cc
* Fix macos build for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_runner.cc
* Fix macos and windows build for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_runner.cc
* Fix benchmark_random_interleaving_test.cc for macos and windows in fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: test/benchmark_random_interleaving_gtest.cc
* Fix int type benchmark_random_interleaving_gtest for macos in fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: test/benchmark_random_interleaving_gtest.cc
* Address dominichamon's comments 03/29 for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: test/benchmark_random_interleaving_gtest.cc
* Address dominichamon's comment on default min_time / repetitions for fr-1051.
Also change sentinel of random_interleaving_repetitions to -1. Hopefully it
fixes the failures on Windows.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
* Fix windows test failures for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_runner.cc
* Add license blurb for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_adjust_repetitions.cc
modified: src/benchmark_adjust_repetitions.h
* Switch to std::shuffle() for fr-1105.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* Change to 1e-9 in fr-1105
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_adjust_repetitions.cc
* Fix broken build caused by bad merge for fr-1105.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_runner.cc
* Fix build breakage for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_register.cc
modified: src/benchmark_runner.cc
* Print out reports as they come in if random interleaving is disabled (fr-1051)
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* size_t, int64_t --> int in benchmark_runner for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_runner.cc
modified: src/benchmark_runner.h
* Address comments from dominichamon for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_adjust_repetitions.cc
modified: src/benchmark_adjust_repetitions.h
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: test/benchmark_random_interleaving_gtest.cc
* benchmar_indices --> size_t to make CI pass: fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* Fix min_time not initialized issue for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
* min_time --> MinTime in fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_runner.cc
* Add doc for random interleaving for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: README.md
new file: docs/random_interleaving.md
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
* Add API to benchmark allowing for custom context to be added
Fixes#525
* add docs
* Add context flag output to JSON reporter
* Plumb everything into the global context.
* Add googletests for custom context
* update docs with duplicate key behaviour
* Add `benchmark_context` flag that allows per-run custom context.
Add support for key-value flags in general.
Added test for key-value flags.
Added `benchmark_context` flag.
Output content of `benchmark_context` to base reporter.
Solves the first part of #525.
* Docs and better help
* Support optional, user-directed collection of performance counters
The patch allows an engineer wishing to drill into the root causes
of a regression, for example. Currently, only single threaded runs
are supported. The feature is a build-time opt in, and then a runtime
opt in.
The engineer may run the benchmark executable, passing a list of
performance counter names (using libpfm's naming scheme) at the
command line. The counter values will then be collected and reported
back as UserCounters.
This is different from #240 in that it is a benchmark user opt-in, and
the counter collection is transparent to the benchmark.
Currently, this is only supported on platforms where libpfm is
supported.
libpfm: http://perfmon2.sourceforge.net/
* 'Use' values param in Snapshot when BENCHMARK_OS_WINDOWS
This is to avoid unused parameter warning-as-error
* Added missing include for <vector> in perf_counters.cc
* Moved doc to docs
* Added license blurbs
In a previous commit[1], diagnostic pragmas were used to avoid this
warning. However, the incorrect warning flag was indicated, leaving the
warning in place. -Wdeprecated is for deprecated features while
-Wdeprecated-declarations for deprecated functions, variables, and
types[2].
[1] c408461983
[2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Initialize option flags from environment variables values if they are defined, eg. `BENCHMARK_OUT=<filename>` for `--benchmark_out=<filename>`. Command line flag value always prevails.
Fixes https://github.com/google/benchmark/issues/881.
- Adresses : #856
- The unused `doc` argument was removed from the `DEFINE_` macros in
`commandlineflags.h`
- Converted all the previous `doc` strings passed to the `DEFINE_`
macros to multiline comments.
The CSVReporter is deprecated, but we still need to reference it in
a few places. To avoid breaking the build when warnings are errors,
we need to disable the warning when we do so.
This is a shameless rip-off of https://github.com/google/benchmark/pull/646
I did promise to look into why that proposed PR was producing
so much worse assembly, and so i finally did.
The reason is - that diff changes `size_t` (unsigned) to `int64_t` (signed).
There is this nice little `assert`:
7a1c370283/include/benchmark/benchmark.h (L744)
It ensures that we didn't magically decide to advance our iterator
when we should have finished benchmarking.
When `cached_` was unsigned, the `assert` was `cached_ UGT 0`.
But we only ever get to that `assert` if `cached_ NE 0`,
and naturally if `cached_` is not `0`, then it is bigger than `0`,
so the `assert` is tautological, and gets folded away.
But now that `cached_` became signed, the assert became `cached_ SGT 0`.
And we still only know that `cached_ NE 0`, so the assert can't be
optimized out, or at least it doesn't currently.
Regardless of whether or not that is a bug in itself,
that particular diff would have regressed the normal 64-bit systems,
by halving the maximal iteration space (since we go from unsigned counter
to signed one, of the same bit-width), which seems like a bug.
And just so it happens, fixing *this* bug, fixes the other bug.
This produces fully (bit-by-bit) identical state_assembly_test.s
The filecheck change is actually needed regardless of this patch,
else this test does not pass for me even without this diff.
Created BenchmarkName class which holds the full benchmark
name and allows specifying and retrieving different components
of the name (e.g. ARGS, THREADS etc.)
Fixes#730.
It is better to let the RunBenchmarks(), report() decide
whether to actually *only* output aggregates or not,
depending on whether there are actually aggregates.
It's subtle indeed.
Previously, `BenchmarkRunner()` always said that "if there are no repetitions,
then you should never output only the repetitions". And the `report()` simply assumed
that the `report_aggregates_only` bool it received makes sense, and simply used it.
Now, the logic is the same, but the blame has shifted.
`BenchmarkRunner()` always propagates what those benchmarks would have wanted
to happen wrt the aggregates. And the `report()` lambda has to actually consider
both the `report_aggregates_only` bool, and it's meaningfulness.
To put it in the context of the patch series - if the repetition count was `1`,
but `*_report_aggregates_only` was set to `true`, and we capture each iteration separately,
then we will compute the aggregates, but then output everything, both the iteration,
and aggregates, despite `*_report_aggregates_only` being set to `true`.
As prevously written, "--benchmark_color=auto" was treated as true,
because IsTruthyFlagValue("auto") returned true. The fix is to
rely on IsColorTerminal test only if the flag value is "auto",
and fall back to IsTruthyFlagValue otherwise. I also integrated
force_no_color check into the same block.
Ok, so, i'm still trying to get to the state when it will be a trivial change to report all the separate iterations.
The old code (LHS of the diff) was rather convoluted i'd say.
I have tried to refactor it a bit into *small* logical chunks, with proper comments.
As far as i can tell, i preserved the intent of the code, what it was doing before.
The road forward still isn't clear, but i'm quite sure it's not with the old code :)
The State constructor should not be part of the public API. Adding a
utility method to BenchmarkInstance allows us to avoid leaking the
RunInThread method into the public API.
As discussed with @dominichamon and @dbabokin, sugar is nice.
Well, maybe not for the health, but it's sweet.
Alright, enough puns.
A special care needs to be applied not to break csv reporter. UGH.
We end up shedding some code over this.
We no longer specially pretty-print them, they are printed just like the rest of custom counters.
Fixes#627.
This is related to @BaaMeow's work in https://github.com/google/benchmark/pull/616 but is not based on it.
Two new fields are tracked, and dumped into JSON:
* If the run is an aggregate, the aggregate's name is stored.
It can be RMS, BigO, mean, median, stddev, or any custom stat name.
* The aggregate-name-less run name is additionally stored.
I.e. not some name of the benchmark function, but the actual
name, but without the 'aggregate name' suffix.
This way one can group/filter all the runs,
and filter by the particular aggregate type.
I *might* need this for further tooling improvement.
Or maybe not.
But this is certainly worthwhile for custom tooling.
There are two destinations:
* display (console, terminal) and
* file.
And each of the destinations can be poplulated with one of the reporters:
* console - human-friendly table-like display
* json
* csv (deprecated)
So using the name console_reporter is confusing.
Is it talking about the console reporter in the sense of
table-like reporter, or in the sense of display destination?
This only specifically represents handling of reporting of aggregates.
Not of anything else. Making it more specific makes the name less generic.
This is an issue because i want to add "iteration report mode",
so the naming would be conflicting.
Inspired by these [two](a1ebe07bea) [bugs](0891555be5) in my code due to the lack of those i have found fixed in my code:
* `kIsIterationInvariant` - `* state.iterations()`
The value is constant for every iteration, and needs to be **multiplied** by the iteration count.
* `kAvgIterations` - `/ state.iterations()`
The is global over all the iterations, and needs to be **divided** by the iteration count.
They play nice with `kIsRate`:
* `kIsIterationInvariantRate`
* `kAvgIterationsRate`.
I'm not sure how meaningful they are when combined with `kAvgThreads`.
I guess the `kIsThreadInvariant` can be added, too, for symmetry with `kAvgThreads`.
* format all documents according to contributor guidelines and specifications
use clang-format on/off to stop formatting when it makes excessively poor decisions
* format all tests as well, and mark blocks which change too much