* Statistics: add support for percentage unit in addition to time
I think, `stddev` statistic is useful, but confusing.
What does it mean if `stddev` of `1ms` is reported?
Is that good or bad? If the `median` is `1s`,
then that means that the measurements are pretty noise-less.
And what about `stddev` of `100ms` is reported?
If the `median` is `1s` - awful, if the `median` is `10s` - good.
And hurray, there is just the statistic that we need:
https://en.wikipedia.org/wiki/Coefficient_of_variation
But, naturally, that produces a value in percents,
but the statistics are currently hardcoded to produce time.
So this refactors thinkgs a bit, and allows a percentage unit for statistics.
I'm not sure whether or not `benchmark` would be okay
with adding this `RSD` statistic by default,
but regales, that is a separate patch.
Refs. https://github.com/google/benchmark/issues/1146
* Address review notes
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate the direct access to these fields.
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate the direct access to these fields.
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
* [benchmark] Introduce accessors for currently public data members `threads` and `thread_index`
Also deprecate direct access to `.thread_index` and make threads a private field
Motivations:
Our internal library provides accessors for those fields because the styleguide disalows accessing classes' data members directly (even if they're const).
There has been a discussion to simply move internal library to make its fields public similarly to the OSS version here, however, the concern is that these kinds of direct access would prevent many types of future design changes (eg how/whether the values would be stored in the data member)
I think the concensus in the end is that we'd change the external library for this case.
AFAIK, there are three important third_party users that we'd need to migrate: tcmalloc, abseil and tensorflow.
Please let me know if I'm missing anyone else.
This can be used together with ArgsProduct() to allow multiple ranges
with different multipliers and mixing dense and sparse ranges.
Example:
BENCHMARK(MyTest)->ArgsProduct({
CreateRange(0, 1024, /*multi=*/32),
CreateRange(0, 100, /*multi=*/4),
CreateDenseRange(0, 4, /*step=*/1)
});
Co-authored-by: Jen-yee Hong <pcmantw@google.com>
Inspired by the original implementation by Hai Huang @haih-g
from https://github.com/google/benchmark/pull/1105.
The original implementation had design deficiencies that
weren't really addressable without redesign, so it was reverted.
In essence, the original implementation consisted of two separateable parts:
* reducing the amount time each repetition is run for, and symmetrically increasing repetition count
* running the repetitions in random order
While it worked fine for the usual case, it broke down when user would specify repetitions
(it would completely ignore that request), or specified per-repetition min time (while it would
still adjust the repetition count, it would not adjust the per-repetition time,
leading to much greater run times)
Here, like i was originally suggesting in the original review, i'm separating the features,
and only dealing with a single one - running repetitions in random order.
Now that the runs/repetitions are no longer in-order, the tooling may wish to sort the output,
and indeed `compare.py` has been updated to do that: #1168.
Much like it makes sense to enumerate all the families,
it makes sense to enumerate stuff within families.
Alternatively, we could have a global instance index,
but i'm not sure why that would be better.
This will be useful when the benchmarks are run not in order,
for the tools to sort the results properly.
It may be useful for those wishing to further post-process JSON results,
but it is mainly geared towards better support for run interleaving,
where results from the same family may not be close-by in the JSON.
While we won't be able to do much about that for outputs,
the tools can and perhaps should reorder the results to that
at least in their output they are in proper order, not run order.
Note that this only counts the families that were filtered-in,
so if e.g. there were three families, and we filtered-out
the second one, the two families (which were first and third)
will have family indexes 0 and 1.
* Support -Wsuggest-override
google/benchmark is C++11 compatible but doesn't use the `override` keyword.
Projects using google/benchmark with enabled `-Wsuggest-override` and `-Werror` will fail to compile.
* Add -Wsuggest-override cxx flag
* Revert unrelated formatting
* Revert unrelated formatting, take 2
* Revert unrelated formatting, take 3
* Disable -Wsuggest-override when compiling tests, gtest does not handle it yet
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
* Add API to benchmark allowing for custom context to be added
Fixes#525
* add docs
* Add context flag output to JSON reporter
* Plumb everything into the global context.
* Add googletests for custom context
* update docs with duplicate key behaviour
* Support optional, user-directed collection of performance counters
The patch allows an engineer wishing to drill into the root causes
of a regression, for example. Currently, only single threaded runs
are supported. The feature is a build-time opt in, and then a runtime
opt in.
The engineer may run the benchmark executable, passing a list of
performance counter names (using libpfm's naming scheme) at the
command line. The counter values will then be collected and reported
back as UserCounters.
This is different from #240 in that it is a benchmark user opt-in, and
the counter collection is transparent to the benchmark.
Currently, this is only supported on platforms where libpfm is
supported.
libpfm: http://perfmon2.sourceforge.net/
* 'Use' values param in Snapshot when BENCHMARK_OS_WINDOWS
This is to avoid unused parameter warning-as-error
* Added missing include for <vector> in perf_counters.cc
* Moved doc to docs
* Added license blurbs
Currently, i get:
```
Run on (32 X 7326.56 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
```
which seems mostly right, except that the frequency is rather bogus.
Yes, i guess the CPU could theoretically achieve that,
but i have 3.6GHz configured, and scaling disabled.
So we clearly read the wrong thing.
With this fix, i now get the expected
```
Run on (32 X 3598.53 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
```
* Implement custom benchmark name
The benchmark's name can be changed using the Name() function
which internally uses SetName().
* Update AUTHORS and CONTRIBUTORS
* Describe new feature in README
* Move new name function up
Fixes#1106
* Add CartesianProduct with associated test
* Use CartesianProduct in Ranges to avoid code duplication
* Add new cartesian_product_test to CMakeLists.txt
* Update AUTHORS & CONTRIBUTORS
* Rename CartesianProduct to ArgsProduct
* Rename test & fixture accordingly
* Add example for ArgsProduct to README
* JSONReporter: don't report on scaling if we didn't get it (#1005)
* JSONReporter: fix due to review (std::pair<bool, bool> -> enum)
* JSONReporter: scaling: fix the algo (due to review discussion)
* benchmark.h: revert to old-fashioned enum's (C++03 compatibility); rreporter_output_test: let's skip scaling
* Add State::error_occurred()
* Relax CHECK condition in benchmark_runner.cc
If the benchmark state contains an error, do not expect any iterations has been run.
This allows using SkipWithError() and return early from the benchmark function.
* README.md: document new possible usage of SkipWithError()
While current counters can e.g. answer the question
"how many items is processed per second", it is impossible to get
it to tell "how many seconds it takes to process a single item".
The solution is to add a yet another modifier `kInvert`,
that is *always* considered last, which simply inverts the answer.
Fixes#781, #830, #848.
This is a shameless rip-off of https://github.com/google/benchmark/pull/646
I did promise to look into why that proposed PR was producing
so much worse assembly, and so i finally did.
The reason is - that diff changes `size_t` (unsigned) to `int64_t` (signed).
There is this nice little `assert`:
7a1c370283/include/benchmark/benchmark.h (L744)
It ensures that we didn't magically decide to advance our iterator
when we should have finished benchmarking.
When `cached_` was unsigned, the `assert` was `cached_ UGT 0`.
But we only ever get to that `assert` if `cached_ NE 0`,
and naturally if `cached_` is not `0`, then it is bigger than `0`,
so the `assert` is tautological, and gets folded away.
But now that `cached_` became signed, the assert became `cached_ SGT 0`.
And we still only know that `cached_ NE 0`, so the assert can't be
optimized out, or at least it doesn't currently.
Regardless of whether or not that is a bug in itself,
that particular diff would have regressed the normal 64-bit systems,
by halving the maximal iteration space (since we go from unsigned counter
to signed one, of the same bit-width), which seems like a bug.
And just so it happens, fixing *this* bug, fixes the other bug.
This produces fully (bit-by-bit) identical state_assembly_test.s
The filecheck change is actually needed regardless of this patch,
else this test does not pass for me even without this diff.
* [JSON] add threads and repetitions to the json output, for better ide…
[Tests] explicitly check for thread == 1
[Tests] specifically mark all repetition checks
[JSON] add repetition_index reporting, but only for non-aggregates (i…
* [Formatting] Be very, very explicit about pointer alignment so clang-format can not put pointers/references on the wrong side of arguments.
[Benchmark::Run] Make sure to use explanatory sentinel variable rather than a magic number.
* Do not pass redundant information
Created BenchmarkName class which holds the full benchmark
name and allows specifying and retrieving different components
of the name (e.g. ARGS, THREADS etc.)
Fixes#730.
* Adding Host Name and test
* Addressing Review Comments
* Adding Test for JSON Reporter
* Adding HOST_NAME_MAX for MacOS systems
* Adding Explaination for MacOS HOST_NAME_MAX Addition
* Addressing Peer Review Comments
* Adding codecvt in windows header guard
* Changing name SystemInfo and adding empty message incase host name fetch fails
* Adding Comment on Struct SystemInfo
The State constructor should not be part of the public API. Adding a
utility method to BenchmarkInstance allows us to avoid leaking the
RunInThread method into the public API.
As discussed with @dominichamon and @dbabokin, sugar is nice.
Well, maybe not for the health, but it's sweet.
Alright, enough puns.
A special care needs to be applied not to break csv reporter. UGH.
We end up shedding some code over this.
We no longer specially pretty-print them, they are printed just like the rest of custom counters.
Fixes#627.
This is related to @BaaMeow's work in https://github.com/google/benchmark/pull/616 but is not based on it.
Two new fields are tracked, and dumped into JSON:
* If the run is an aggregate, the aggregate's name is stored.
It can be RMS, BigO, mean, median, stddev, or any custom stat name.
* The aggregate-name-less run name is additionally stored.
I.e. not some name of the benchmark function, but the actual
name, but without the 'aggregate name' suffix.
This way one can group/filter all the runs,
and filter by the particular aggregate type.
I *might* need this for further tooling improvement.
Or maybe not.
But this is certainly worthwhile for custom tooling.
They are basically proto-version of custom user counters.
It does not seem that they do anything that custom user counters
don't do. And having two similar entities is not good for generalization.
Migration plan:
* ```
SetItemsProcessed(<val>)
=>
state.counters.insert({
{"<Name>", benchmark::Counter(<val>, benchmark::Counter::kIsRate)},
...
});
```
* ```
SetBytesProcessed(<val>)
=>
state.counters.insert({
{"<Name>", benchmark::Counter(<val>, benchmark::Counter::kIsRate,
benchmark::Counter::OneK::kIs1024)},
...
});
```
* ```
<Name>_processed()
=>
state.counters["<Name>"]
```
One thing the custom user counters miss is better support
for units of measurement.
Refs. https://github.com/google/benchmark/issues/627
* Counter(): add 'one thousand' param.
Needed for https://github.com/google/benchmark/pull/654
Custom user counters are quite custom. It is not guaranteed
that the user *always* expects for these to have 1k == 1000.
If the counter represents bytes/memory/etc, 1k should be 1024.
Some bikeshedding points:
1. Is this sufficient, or do we really want to go full on
into custom types with names?
I think just the '1000' is sufficient for now.
2. Should there be a helper benchmark::Counter::Counter{1000,1024}()
static 'constructor' functions, since these two, by far,
will be the most used?
3. In the future, we should be somehow encoding this info into JSON.
* Counter(): use std::pair<> to represent 'one thousand'
* Counter(): just use a new enum with two values 1000 vs 1024.
Simpler is better. If someone comes up with a real reason
to need something more advanced, it can be added later on.
* Counter: just store the 1000 or 1024 in the One_K values directly
* Counter: s/One_K/OneK/
There are two destinations:
* display (console, terminal) and
* file.
And each of the destinations can be poplulated with one of the reporters:
* console - human-friendly table-like display
* json
* csv (deprecated)
So using the name console_reporter is confusing.
Is it talking about the console reporter in the sense of
table-like reporter, or in the sense of display destination?
This is *only* exposed in the JSON. Not in CSV, which is deprecated.
This *only* supposed to track these two states.
An additional field could later track which aggregate this is,
specifically (statistic name, rms, bigo, ...)
The motivation is that we already have ReportAggregatesOnly,
but it affects the entire reports, both the display,
and the reporters (json files), which isn't ideal.
It would be very useful to have a 'display aggregates only' option,
both in the library's console reporter, and the python tooling,
This will be especially needed for the 'store separate iterations'.
This only specifically represents handling of reporting of aggregates.
Not of anything else. Making it more specific makes the name less generic.
This is an issue because i want to add "iteration report mode",
so the naming would be conflicting.
* Remove redundant default which causes failures
* Fix old GCC warnings caused by poor analysis
* Use __builtin_unreachable
* Use BENCHMARK_UNREACHABLE()
* Pull __has_builtin to benchmark.h too
* Also move compiler identification macro to main header
* Move custom compiler identification macro back
High system load can skew benchmark results. By including system load averages
in the library's output, we help users identify a potential issue in the
quality of their measurements, and thus assist them in producing better (more
reproducible) results.
I got the idea for this from Brendan Gregg's checklist for benchmark accuracy
(http://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html).
Inspired by these [two](a1ebe07bea) [bugs](0891555be5) in my code due to the lack of those i have found fixed in my code:
* `kIsIterationInvariant` - `* state.iterations()`
The value is constant for every iteration, and needs to be **multiplied** by the iteration count.
* `kAvgIterations` - `/ state.iterations()`
The is global over all the iterations, and needs to be **divided** by the iteration count.
They play nice with `kIsRate`:
* `kIsIterationInvariantRate`
* `kAvgIterationsRate`.
I'm not sure how meaningful they are when combined with `kAvgThreads`.
I guess the `kIsThreadInvariant` can be added, too, for symmetry with `kAvgThreads`.
As @dominichamon and I have discussed, the current reporter interface
is poor at best. And something should be done to fix it.
I strongly suspect such a fix will require an entire reimagining
of the API, and therefore breaking backwards compatibility fully.
For that reason we should start deprecating and removing parts
that we don't intend to replace. One of these parts, I argue,
is the CSVReporter. I propose that the new reporter interface
should choose a single output format (JSON) and traffic entirely
in that. If somebody really wanted to replace the functionality
of the CSVReporter they would do so as an external tool which
transforms the JSON.
For these reasons I propose deprecating the CSVReporter.