This can be used together with ArgsProduct() to allow multiple ranges
with different multipliers and mixing dense and sparse ranges.
Example:
BENCHMARK(MyTest)->ArgsProduct({
CreateRange(0, 1024, /*multi=*/32),
CreateRange(0, 100, /*multi=*/4),
CreateDenseRange(0, 4, /*step=*/1)
});
Co-authored-by: Jen-yee Hong <pcmantw@google.com>
Inspired by the original implementation by Hai Huang @haih-g
from https://github.com/google/benchmark/pull/1105.
The original implementation had design deficiencies that
weren't really addressable without redesign, so it was reverted.
In essence, the original implementation consisted of two separateable parts:
* reducing the amount time each repetition is run for, and symmetrically increasing repetition count
* running the repetitions in random order
While it worked fine for the usual case, it broke down when user would specify repetitions
(it would completely ignore that request), or specified per-repetition min time (while it would
still adjust the repetition count, it would not adjust the per-repetition time,
leading to much greater run times)
Here, like i was originally suggesting in the original review, i'm separating the features,
and only dealing with a single one - running repetitions in random order.
Now that the runs/repetitions are no longer in-order, the tooling may wish to sort the output,
and indeed `compare.py` has been updated to do that: #1168.
Currently the lifetime of a single BenchmarkRunner is constrained
to a RunBenchmark(), but that will have to change for interleaved
benchmark execution, because we'll need to keep it around to not
forget how much repetitions of an instance we've done.
While the current variant works, it assumes that all the instances of
a single family will be run together, with nothing inbetween them.
Naturally, that won't work once the runs may be interleaved.
Much like it makes sense to enumerate all the families,
it makes sense to enumerate stuff within families.
Alternatively, we could have a global instance index,
but i'm not sure why that would be better.
This will be useful when the benchmarks are run not in order,
for the tools to sort the results properly.
It may be useful for those wishing to further post-process JSON results,
but it is mainly geared towards better support for run interleaving,
where results from the same family may not be close-by in the JSON.
While we won't be able to do much about that for outputs,
the tools can and perhaps should reorder the results to that
at least in their output they are in proper order, not run order.
Note that this only counts the families that were filtered-in,
so if e.g. there were three families, and we filtered-out
the second one, the two families (which were first and third)
will have family indexes 0 and 1.
* Implementation of random interleaving. See
http://github.com/google/benchmark/issues/1051 for the feature requests.
Committer: Hai Huang (http://github.com/haih-g)
On branch fr-1051
Changes to be committed:
modified: include/benchmark/benchmark.h
modified: src/benchmark.cc
new file: src/benchmark_adjust_repetitions.cc
new file: src/benchmark_adjust_repetitions.h
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_register.cc
modified: src/benchmark_runner.cc
modified: src/benchmark_runner.h
modified: test/CMakeLists.txt
new file: test/benchmark_random_interleaving_gtest.cc
* Fix benchmark_random_interleaving_gtest.cc for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_runner.cc
modified: test/benchmark_random_interleaving_gtest.cc
* Fix macos build for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_runner.cc
* Fix macos and windows build for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_runner.cc
* Fix benchmark_random_interleaving_test.cc for macos and windows in fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: test/benchmark_random_interleaving_gtest.cc
* Fix int type benchmark_random_interleaving_gtest for macos in fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: test/benchmark_random_interleaving_gtest.cc
* Address dominichamon's comments 03/29 for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: test/benchmark_random_interleaving_gtest.cc
* Address dominichamon's comment on default min_time / repetitions for fr-1051.
Also change sentinel of random_interleaving_repetitions to -1. Hopefully it
fixes the failures on Windows.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
* Fix windows test failures for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_runner.cc
* Add license blurb for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_adjust_repetitions.cc
modified: src/benchmark_adjust_repetitions.h
* Switch to std::shuffle() for fr-1105.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* Change to 1e-9 in fr-1105
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_adjust_repetitions.cc
* Fix broken build caused by bad merge for fr-1105.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_runner.cc
* Fix build breakage for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_register.cc
modified: src/benchmark_runner.cc
* Print out reports as they come in if random interleaving is disabled (fr-1051)
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* size_t, int64_t --> int in benchmark_runner for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_runner.cc
modified: src/benchmark_runner.h
* Address comments from dominichamon for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
modified: src/benchmark_adjust_repetitions.cc
modified: src/benchmark_adjust_repetitions.h
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: test/benchmark_random_interleaving_gtest.cc
* benchmar_indices --> size_t to make CI pass: fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark.cc
* Fix min_time not initialized issue for fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
* min_time --> MinTime in fr-1051.
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: src/benchmark_api_internal.cc
modified: src/benchmark_api_internal.h
modified: src/benchmark_runner.cc
* Add doc for random interleaving for fr-1051
Committer: Hai Huang <haih@google.com>
On branch fr-1051
Your branch is up to date with 'origin/fr-1051'.
Changes to be committed:
modified: README.md
new file: docs/random_interleaving.md
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
This also fixes#1135. Because StrSplit was returning a vector with an
empty string, it was treated by PerfCounters::Create as a legitimate ask
for setting up a counter with that name. The empty vector is understood
by PerfCounters as "just return NoCounters()".
* Add API to benchmark allowing for custom context to be added
Fixes#525
* add docs
* Add context flag output to JSON reporter
* Plumb everything into the global context.
* Add googletests for custom context
* update docs with duplicate key behaviour
* Add `benchmark_context` flag that allows per-run custom context.
Add support for key-value flags in general.
Added test for key-value flags.
Added `benchmark_context` flag.
Output content of `benchmark_context` to base reporter.
Solves the first part of #525.
* Docs and better help
* Support optional, user-directed collection of performance counters
The patch allows an engineer wishing to drill into the root causes
of a regression, for example. Currently, only single threaded runs
are supported. The feature is a build-time opt in, and then a runtime
opt in.
The engineer may run the benchmark executable, passing a list of
performance counter names (using libpfm's naming scheme) at the
command line. The counter values will then be collected and reported
back as UserCounters.
This is different from #240 in that it is a benchmark user opt-in, and
the counter collection is transparent to the benchmark.
Currently, this is only supported on platforms where libpfm is
supported.
libpfm: http://perfmon2.sourceforge.net/
* 'Use' values param in Snapshot when BENCHMARK_OS_WINDOWS
This is to avoid unused parameter warning-as-error
* Added missing include for <vector> in perf_counters.cc
* Moved doc to docs
* Added license blurbs
Currently, i get:
```
Run on (32 X 7326.56 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
```
which seems mostly right, except that the frequency is rather bogus.
Yes, i guess the CPU could theoretically achieve that,
but i have 3.6GHz configured, and scaling disabled.
So we clearly read the wrong thing.
With this fix, i now get the expected
```
Run on (32 X 3598.53 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
```
Use the benchmark's reported iteration count when estimating
iterations for the next repetition, rather than the requested
iteration count. When the benchmark uses KeepRunningBatch the actual
iteration count can be larger than the one the runner requested.
Prior to this fix the runner was underestimating the next iteration
count, sometimes significantly so. Consider the case of a benchmark
using a batch size of 1024. Prior to this change, the benchmark
runner would attempt iteration counts 1, 10, 100 and 1000, yet the
benchmark itself would do the same amount of work each time: a single
batch of 1024 iterations. The discrepancy could also contribute to
estimation errors once the benchmark time reached 10% of the target.
For example, if the very first batch of 1024 iterations reached 10% of
benchmark_min_min time, the runner would attempt to scale that to 100%
from a basis of one iteration rather than 1024.
This bug was particularly noticeable in benchmarks with large batch
sizes, especially when the benchmark also had slow set up or tear down
phases.
With this fix in place it is possible to use KeepRunningBatch to
achieve a kind of "minimum iteration count" feature by using a larger
fixed batch size. For example, a benchmark may build a map of 500K
elements and test a "find" operation. There is no point in running
"find" just 1, 10, 100, etc., times. The benchmark can now pick a
batch size of something like 10K, and the runner will arrive at the
final max iteration count with in noticeably fewer repetitions.
When building with gcc TSan on, and in Debug mode, we see a warning
like:
benchmark/src/timers.cc: In function ‘std::string benchmark::LocalDateTimeString()’:
src/timers.cc:241:15: warning: ‘char* strncat(char*, const char*, size_t)’ output may be truncated copying 108 bytes from a string of length 127 [-Wstringop-truncation]
241 | std::strncat(storage, tz_offset, sizeof(storage) - timestamp_len - 1);
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While this is essentially a false positive (we never expect
the number of bytes in tz_offset to be too large), the compiler can't
actually tell that. Shrink the size of tz_offset to a smaller, but still safe
size to eliminate this warning.
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
* Implement custom benchmark name
The benchmark's name can be changed using the Name() function
which internally uses SetName().
* Update AUTHORS and CONTRIBUTORS
* Describe new feature in README
* Move new name function up
Fixes#1106
The existing behavior results in the `0` value being added twice. Since
`lo` is always added to `dst`, we never want to explicitly add `0` if
`lo` is equal to `0`.
On s390 architecture, z/OS XL compiler uses HLASM inline assembly, which has different syntax and needs to be distinguished to avoid compilation error.
Noticed missing header when was building llvm with gcc-11:
```
llvm-project/llvm/utils/benchmark/src/benchmark_register.h:17:30:
error: 'numeric_limits' is not a member of 'std'
17 | static const T kmax = std::numeric_limits<T>::max();
| ^~~~~~~~~~~~~~
```
Without this commit, compilation fails on DragonFly with the following message:
```
/home/mneumann/Dev/benchmark.old/src/sysinfo.cc:446:2: error: #warning "HOST_NAME_MAX not defined. using 64" [-Werror=cpp]
^~~~~~~
```
Also note that the sysctl is actually `hw.tsc_frequency` on DragonFly:
```
$ sysctl hw.tsc_frequency
hw.tsc_frequency: 3498984022
```
Tested on:
```
$ uname -a
DragonFly box.localnet 5.9-DEVELOPMENT DragonFly v5.9.0.742.g4b29dd-DEVELOPMENT #5: Tue Aug 18 00:21:31 CEST 2020
```
As per discussions in here [1], LLVM is going to get backend support on
Motorola 68000 series CPUs (a.k.a M68K or M680x0). So it's necessary to
add CycleTimer implementation here, which is simply using `gettimeofday`
same as MIPS. This fixes#1049
[1] https://reviews.llvm.org/D88389
* Add CartesianProduct with associated test
* Use CartesianProduct in Ranges to avoid code duplication
* Add new cartesian_product_test to CMakeLists.txt
* Update AUTHORS & CONTRIBUTORS
* Rename CartesianProduct to ArgsProduct
* Rename test & fixture accordingly
* Add example for ArgsProduct to README
As noted in #995, this causes issues when the command line flag already
starts with "benchmark_", which they all do.
Not caught by tests as the test flags didn't start with "benchmark".
Fixes#995
* JSONReporter: don't report on scaling if we didn't get it (#1005)
* JSONReporter: fix due to review (std::pair<bool, bool> -> enum)
* JSONReporter: scaling: fix the algo (due to review discussion)
* benchmark.h: revert to old-fashioned enum's (C++03 compatibility); rreporter_output_test: let's skip scaling
* timestamp: use rfc3339-formatted timestamps in output
Replace localized timestamps with machine-readable IETF RFC 3339 format
timestamps. This is an attempt to make the output timestamps easily
machine-readable. ISO8601 specifies standards for time interchange
formats. IETF RFC 3339: https://tools.ietf.org/html/rfc3339 defines a
subset of these for use in the internet. The general form for these
timestamps is:
YYYY-MM-DDTHH:mm:SS[+-]hhmm
This replaces the localized time formats that are currently being used
in the benchmark output to prioritize interchangeability and
machine-readability.
This might break existing programs that rely on the particular date-time
format. This might also may make times less human readable. RFC3339 was
intended to balance human readability and simplicity for machine
readability, but it is primarily intended as an internal representation.
* timers: remove utc string formatting
We only ever need local time printing. Remove the UTC printing
and cosnolidate the logic slightly.
* timers: manually create rfc3339 string
The C++ standard library does not output the time offset in RFC3339
format, it is missing the : between hours and minutes. VS does not
appear to support timezone information by default. To avoid adding too
much complexity to benchmark around timezone handling e.g. a full
date library like https://github.com/HowardHinnant/date, we fall back
to outputting GMT time with a -00:00 offset for those cases.
* timers: use reentrant form for localtime_r & tmtime_r
For non-windows, use the reentrant form for the time conversion
functions.
* timers: cleanup
Use strtol instead of brittle moving characters around.
* timers: only call strftime twice.
Also size buffers to known maximum necessary size and name constants
more appropriately.
* timers: fix unused variable warning
* Add missing <cerrno> header
This commit fixes a current build error on Android where 'errno' is an unidentified
symbol due to a missing header
* Update string_util.cc
Conditionally adds <cerrno> if BENCHMARK_STL_ANDROID_GNUSTL is defined
In a previous commit[1], diagnostic pragmas were used to avoid this
warning. However, the incorrect warning flag was indicated, leaving the
warning in place. -Wdeprecated is for deprecated features while
-Wdeprecated-declarations for deprecated functions, variables, and
types[2].
[1] c408461983
[2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Fixes the following issues with the implementation of `cycleclock::Now`:
- The RISC-V implementation wouldn't compile due to a typo;
- Both the PPC and RISC-V implementation's asm statements lacked the
volatile keyword. This resulted in the repeated read of the counter's
high part being optimized away, so overflow wasn't handled at all.
Multiple counter reads could also be misoptimized, especially in LTO
scenarios.
- Relied on the zero/sign-extension of inline asm operands, which isn't
guaranteed to occur and differs between compilers, namely GCC and Clang.
The PowerPC64 implementation was improved to do a single 64-bit read of
the time-base counter.
The RISC-V implementation was improved to do the overflow handing in
assembly, since Clang would generate a branch, defeating the purpose
of the non-branching counter reading approach.
* Fix type conversion warnings.
Fixes#949
Tested locally (Linux/clang), but warnings are on MSVC so may differ.
* Drop the ULP so the double test passes
* Add State::error_occurred()
* Relax CHECK condition in benchmark_runner.cc
If the benchmark state contains an error, do not expect any iterations has been run.
This allows using SkipWithError() and return early from the benchmark function.
* README.md: document new possible usage of SkipWithError()
This fixes the Visual Studio 2019 warning:
`C4244: '=': conversion from 'int' to 'char', possible loss of data`
When implicitly casting the return value of tolower() (int) to char.
Fixes: #932
* add Jordan Williams to both CONTRIBUTORS and AUTHORS
* alias benchmark libraries
Provide aliased CMake targets for the benchmark and benchmark_main targets.
The alias targets are namespaced under benchmark::, which is the namespace when they are exported.
I chose not to use either the PROJECT_NAME or the namespace variable but to hard-code the namespace.
This is because the benchmark and benchmark_main targets are hard-coded by name themselves.
Hard-coding the namespace is also much cleaner and easier to read.
* link to aliased benchmark targets
It is safer to link against namespaced targets because of how CMake interprets the double colon.
Typo's will be caught by CMake at configuration-time instead of during compile / link time.
* document the provided alias targets
* add "Usage with CMake" section in documentation
This section covers linking against the alias/import CMake targets and including them using either find_package or add_subdirectory.
* format the "Usage with CMake" README section
Added a newline after the "Usage with CMake" section header.
Dropped the header level of the section by one to make it a direct subsection of the "Usage" section.
Wrapped lines to be no longer than 80 characters in length.
* Add DEBUG_POSTFIX to libraries.
Makes it possible to install Debug and Release versions on the
same system. Without this, there were only linker errors when using
the wrong configuration.
* Update CONTRIBUTORS and AUTHORS according to guide
Initialize option flags from environment variables values if they are defined, eg. `BENCHMARK_OUT=<filename>` for `--benchmark_out=<filename>`. Command line flag value always prevails.
Fixes https://github.com/google/benchmark/issues/881.
These OS's don't always have HOST_NAME_MAX defined, resulting in
build errors.
A few related changes as well:
* Only define HOST_NAME_MAX if it's not already defined. There are
some cases where this is already defined, e.g. with NaCl if
__USE_POSIX is set. To avoid all of these, only define it if it's
not already defined.
* Default HOST_NAME_MAX to 64 and issue a #warning. Having the wrong
max length is pretty harmless. The name just ends up getting
truncated and this is only for printing debug info. Because we're
constructing a std::string from a char[] (so defined length), we
don't need to worry about gethostname's undefined behavior for
whether the truncation is null-terminated when the hostname
doesn't fit in HOST_NAME_MAX. Of course, this doesn't help people
who have -Werror set, since they'll still get a warning.
- Adresses : #856
- The unused `doc` argument was removed from the `DEFINE_` macros in
`commandlineflags.h`
- Converted all the previous `doc` strings passed to the `DEFINE_`
macros to multiline comments.
While current counters can e.g. answer the question
"how many items is processed per second", it is impossible to get
it to tell "how many seconds it takes to process a single item".
The solution is to add a yet another modifier `kInvert`,
that is *always* considered last, which simply inverts the answer.
Fixes#781, #830, #848.
The CSVReporter is deprecated, but we still need to reference it in
a few places. To avoid breaking the build when warnings are errors,
we need to disable the warning when we do so.
The filenames are consistently inconsistent in windows world, might
have something to do with default file system being case-insensitive.
While the native MinGW buils were fixed in 5261307982
that only addressed the headers, but not libraries.
The problem remains when one tries to do a MinGW cross-build from
case-sensitive filesystem.
The RISC-V implementation of `cycleclock::Now` uses the user-space
`rdcycle` instruction to query how many cycles have happened since the
core started.
The only complexity here is on 32-bit RISC-V, where `rdcycle` can only
read the lower 32 bits of the 64-bit hardware counter. In this case,
`rdcycleh` reads the higher 32 bits of the counter. We match the powerpc
implementation to detect and correct for overflow in the high bits.
This is a shameless rip-off of https://github.com/google/benchmark/pull/646
I did promise to look into why that proposed PR was producing
so much worse assembly, and so i finally did.
The reason is - that diff changes `size_t` (unsigned) to `int64_t` (signed).
There is this nice little `assert`:
7a1c370283/include/benchmark/benchmark.h (L744)
It ensures that we didn't magically decide to advance our iterator
when we should have finished benchmarking.
When `cached_` was unsigned, the `assert` was `cached_ UGT 0`.
But we only ever get to that `assert` if `cached_ NE 0`,
and naturally if `cached_` is not `0`, then it is bigger than `0`,
so the `assert` is tautological, and gets folded away.
But now that `cached_` became signed, the assert became `cached_ SGT 0`.
And we still only know that `cached_ NE 0`, so the assert can't be
optimized out, or at least it doesn't currently.
Regardless of whether or not that is a bug in itself,
that particular diff would have regressed the normal 64-bit systems,
by halving the maximal iteration space (since we go from unsigned counter
to signed one, of the same bit-width), which seems like a bug.
And just so it happens, fixing *this* bug, fixes the other bug.
This produces fully (bit-by-bit) identical state_assembly_test.s
The filecheck change is actually needed regardless of this patch,
else this test does not pass for me even without this diff.
* Add support for GNU Install Dirs from GNU Coding Standards
* src/CMakeLists.txt: Added support for setting the standard variables,
such as CMAKE_INSTALL_BINDIR.
* Replace install destinations by the ones from GNU Coding Standards.
* Set the default .cmake and .pc default path.
* escape special chars in csv and json output.
- escape \b,\f,\n,\r,\t,\," from strings before dumping
them to json or csv.
- also faithfully reproduce the sign of nan in json.
this fixes github issue #745.
* functionalize.
* split string escape functions between csv and json
* Update src/csv_reporter.cc
Co-Authored-By: tesch1 <tesch1@gmail.com>
* Update src/json_reporter.cc
Co-Authored-By: tesch1 <tesch1@gmail.com>
* Add FIXME in multiple_ranges_test.cc
* Improve handling of large bounds in AddRange.
Due to breaking the loop too early, AddRange
would miss a final multplier of 'mult' that
was within the numeric range of T.
* Enable negative values for Range argument
Fixes#762.
* Try to fix build of benchmark_gtest
* Try some more to fix build
* Attempt to fix format macros
* Attempt to resolve format errors for mingw32
* Review feedback
Put unit tests in benchmark::internal namespace
Fix error reporting in multiple_ranges_test.cc
* [JSON] add threads and repetitions to the json output, for better ide…
[Tests] explicitly check for thread == 1
[Tests] specifically mark all repetition checks
[JSON] add repetition_index reporting, but only for non-aggregates (i…
* [Formatting] Be very, very explicit about pointer alignment so clang-format can not put pointers/references on the wrong side of arguments.
[Benchmark::Run] Make sure to use explanatory sentinel variable rather than a magic number.
* Do not pass redundant information
Created BenchmarkName class which holds the full benchmark
name and allows specifying and retrieving different components
of the name (e.g. ARGS, THREADS etc.)
Fixes#730.
- On qnx platform, cpu and cache info is stored in a syspage struct which
is different from other OS platform.
- The fix has been verified on an aarch64 target running qnx 7.0.
Fixes#774
Some benchmarks are particularly sensitive and they run in less than
a nanosecond. In order for the console reporter to provide meaningful
output for such benchmarks it needs to be able to display the times
using more resolution than a single nanosecond.
This patch changes the console reporter to print at least three
significant digits for all results.
Unlike the initial attempt, this patch does not align the decimal point.
* Adding Host Name and test
* Addressing Review Comments
* Adding Test for JSON Reporter
* Adding HOST_NAME_MAX for MacOS systems
* Adding Explaination for MacOS HOST_NAME_MAX Addition
* Addressing Peer Review Comments
* Adding codecvt in windows header guard
* Changing name SystemInfo and adding empty message incase host name fetch fails
* Adding Comment on Struct SystemInfo
It is incorrect to say that an aggregate is computed over
run's iterations, because those iterations already got averaged.
Similarly, if there are N repetitions with 1 iterations each,
an aggregate will be computed over N measurements, not 1.
Thus it is best to simply use the count of separate reports.
Fixes#586.
It is better to let the RunBenchmarks(), report() decide
whether to actually *only* output aggregates or not,
depending on whether there are actually aggregates.
It's subtle indeed.
Previously, `BenchmarkRunner()` always said that "if there are no repetitions,
then you should never output only the repetitions". And the `report()` simply assumed
that the `report_aggregates_only` bool it received makes sense, and simply used it.
Now, the logic is the same, but the blame has shifted.
`BenchmarkRunner()` always propagates what those benchmarks would have wanted
to happen wrt the aggregates. And the `report()` lambda has to actually consider
both the `report_aggregates_only` bool, and it's meaningfulness.
To put it in the context of the patch series - if the repetition count was `1`,
but `*_report_aggregates_only` was set to `true`, and we capture each iteration separately,
then we will compute the aggregates, but then output everything, both the iteration,
and aggregates, despite `*_report_aggregates_only` being set to `true`.
As prevously written, "--benchmark_color=auto" was treated as true,
because IsTruthyFlagValue("auto") returned true. The fix is to
rely on IsColorTerminal test only if the flag value is "auto",
and fall back to IsTruthyFlagValue otherwise. I also integrated
force_no_color check into the same block.
Ok, so, i'm still trying to get to the state when it will be a trivial change to report all the separate iterations.
The old code (LHS of the diff) was rather convoluted i'd say.
I have tried to refactor it a bit into *small* logical chunks, with proper comments.
As far as i can tell, i preserved the intent of the code, what it was doing before.
The road forward still isn't clear, but i'm quite sure it's not with the old code :)
The State constructor should not be part of the public API. Adding a
utility method to BenchmarkInstance allows us to avoid leaking the
RunInThread method into the public API.
When building for ARM, there is a fallback codepath that uses
gettimeofday, which requires sys/time.h.
The Windows SDK doesn't have this header, but MinGW does have it.
Thus, this fixes building for Windows on ARM with MinGW
headers/libraries, while Windows on ARM with the Windows SDK still
is broken.
The windows SDK headers don't have self-consistent casing anyway,
and many projects consistently use lowercase for them, in order
to fix crosscompilation with mingw headers.
As discussed with @dominichamon and @dbabokin, sugar is nice.
Well, maybe not for the health, but it's sweet.
Alright, enough puns.
A special care needs to be applied not to break csv reporter. UGH.
We end up shedding some code over this.
We no longer specially pretty-print them, they are printed just like the rest of custom counters.
Fixes#627.
This is related to @BaaMeow's work in https://github.com/google/benchmark/pull/616 but is not based on it.
Two new fields are tracked, and dumped into JSON:
* If the run is an aggregate, the aggregate's name is stored.
It can be RMS, BigO, mean, median, stddev, or any custom stat name.
* The aggregate-name-less run name is additionally stored.
I.e. not some name of the benchmark function, but the actual
name, but without the 'aggregate name' suffix.
This way one can group/filter all the runs,
and filter by the particular aggregate type.
I *might* need this for further tooling improvement.
Or maybe not.
But this is certainly worthwhile for custom tooling.
I have absolutely no way to test this, but this looks obviously-good.
This was reported by Tim Northover @TNorthover in
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180903/584223.html
> I think this breaks some 32-bit configurations (well, mine at least).
> I was using Clang (from Xcode 10 beta) on macOS and got a bunch of
> errors referencing sysinfo.cc:292 and onwards:
> /Users/tim/llvm/llvm-project/llvm/utils/benchmark/src/sysinfo.cc:292:47:
> error: non-constant-expression cannot be narrowed from type
> 'std::__1::array<unsigned long long, 4>::value_type' (aka 'unsigned
> long long') to 'size_t' (aka 'unsigned long') in initializer list
> [-Wc++11-narrowing]
> } Cases[] = {{"hw.l1dcachesize", "Data", 1, CacheCounts[1]},
> ^~~~~~~~~~~~~~
>
> The same happens when self-hosting ToT. Unfortunately I couldn't
> reproduce the issue on Debian (Clang 6.0.1) even with libc++; I'm not
> sure what the difference is.
* Counter(): add 'one thousand' param.
Needed for https://github.com/google/benchmark/pull/654
Custom user counters are quite custom. It is not guaranteed
that the user *always* expects for these to have 1k == 1000.
If the counter represents bytes/memory/etc, 1k should be 1024.
Some bikeshedding points:
1. Is this sufficient, or do we really want to go full on
into custom types with names?
I think just the '1000' is sufficient for now.
2. Should there be a helper benchmark::Counter::Counter{1000,1024}()
static 'constructor' functions, since these two, by far,
will be the most used?
3. In the future, we should be somehow encoding this info into JSON.
* Counter(): use std::pair<> to represent 'one thousand'
* Counter(): just use a new enum with two values 1000 vs 1024.
Simpler is better. If someone comes up with a real reason
to need something more advanced, it can be added later on.
* Counter: just store the 1000 or 1024 in the One_K values directly
* Counter: s/One_K/OneK/
There are two destinations:
* display (console, terminal) and
* file.
And each of the destinations can be poplulated with one of the reporters:
* console - human-friendly table-like display
* json
* csv (deprecated)
So using the name console_reporter is confusing.
Is it talking about the console reporter in the sense of
table-like reporter, or in the sense of display destination?
This is *only* exposed in the JSON. Not in CSV, which is deprecated.
This *only* supposed to track these two states.
An additional field could later track which aggregate this is,
specifically (statistic name, rms, bigo, ...)
The motivation is that we already have ReportAggregatesOnly,
but it affects the entire reports, both the display,
and the reporters (json files), which isn't ideal.
It would be very useful to have a 'display aggregates only' option,
both in the library's console reporter, and the python tooling,
This will be especially needed for the 'store separate iterations'.
This only specifically represents handling of reporting of aggregates.
Not of anything else. Making it more specific makes the name less generic.
This is an issue because i want to add "iteration report mode",
so the naming would be conflicting.
* Remove redundant default which causes failures
* Fix old GCC warnings caused by poor analysis
* Use __builtin_unreachable
* Use BENCHMARK_UNREACHABLE()
* Pull __has_builtin to benchmark.h too
* Also move compiler identification macro to main header
* Move custom compiler identification macro back
High system load can skew benchmark results. By including system load averages
in the library's output, we help users identify a potential issue in the
quality of their measurements, and thus assist them in producing better (more
reproducible) results.
I got the idea for this from Brendan Gregg's checklist for benchmark accuracy
(http://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html).
* Set -Wno-deprecated-declarations for Intel
Intel compiler silently ignores -Wno-deprecated-declarations
so warning no. 1786 must be explicitly suppressed.
* Make std::int64_t → double casts explicit
While std::int64_t → double is a perfectly conformant
implicit conversion, Intel compiler warns about it.
Make them explicit via static_cast<double>.
* Make std::int64_t → int casts explicit
Intel compiler warns about emplacing an std::int64_t
into an int container. Just make the conversion explicit
via static_cast<int>.
* Cleanup Intel -Wno-deprecated-declarations workaround logic
Inspired by these [two](a1ebe07bea) [bugs](0891555be5) in my code due to the lack of those i have found fixed in my code:
* `kIsIterationInvariant` - `* state.iterations()`
The value is constant for every iteration, and needs to be **multiplied** by the iteration count.
* `kAvgIterations` - `/ state.iterations()`
The is global over all the iterations, and needs to be **divided** by the iteration count.
They play nice with `kIsRate`:
* `kIsIterationInvariantRate`
* `kAvgIterationsRate`.
I'm not sure how meaningful they are when combined with `kAvgThreads`.
I guess the `kIsThreadInvariant` can be added, too, for symmetry with `kAvgThreads`.
* Fix compilation on Android with GNU STL
GNU STL in Android NDK lacks string conversion functions from C++11, including std::stoul, std::stoi, and std::stod.
This patch reimplements these functions in benchmark:: namespace using C-style equivalents from C++03.
* Avoid use of log2 which doesn't exist in Android GNU STL
GNU STL in Android NDK lacks log2 function from C99/C++11.
This patch replaces their use in the code with double log(double) function.
* format all documents according to contributor guidelines and specifications
use clang-format on/off to stop formatting when it makes excessively poor decisions
* format all tests as well, and mark blocks which change too much
* Add benchmark_main library with support for Bazel.
* fix newline at end of file
* Add CMake support for benchmark_main.
* Mention optionally using benchmark_main in README.
* Allow support for negative regex filtering
This patch allows one to apply a negation to the entire regex filter
by appending it with a '-' character, much in the same style as
GoogleTest uses.
* Address issues in PR
* Add unit tests for negative filtering
Before this change, we would report the number of requested iterations
passed to the state. After, we will report the actual number run. As a
side-effect, instead of multiplying the expected iterations by the
number of threads to get the total number, we can report the actual
number of iterations across all threads, which takes into account the
situation where some threads might run more iterations than others.
* Ensure 64-bit truncation doesn't happen for complexity results
* One more complexity_n 64-bit fix
* Missed another vector of int
* Piping through the int64_t
* Allow AddRange to work with int64_t.
Fixes#516
Also, tweak how we manage per-test build needs, and create a standard
_gtest suffix for googletest to differentiate from non-googletest tests.
I also ran clang-format on the files that I changed (but not the
benchmark include or main src as they have too many clang-format
issues).
* Add benchmark_gtest to cmake
* Set(Items|Bytes)Processed now take int64_t
Having the copts set on a per-target level can lead to ODR violations
in some cases. Avoid this by ensuring the regex engine is picked
through compiler intrinsics in the header directly.
This patch disables the -Winvalid-offsetof warning for GCC and Clang
when using it to check the cache lines of the State object.
Technically this usage of offsetof is undefined behavior until C++17.
However, all major compilers support this application as an extension,
as demonstrated by the passing static assert (If a compiler encounters UB
during evaluation of a constant expression, that UB must be diagnosed).
Unfortunately, Clang and GCC also produce a warning about it.
This patch temporarily suppresses the warning using #pragma's in the
source file (instead of globally suppressing the warning in the build systems).
This way the warning is ignored for both CMake and Bazel builds without
having to modify either build system.
* Rename StringXxx to StrXxx in string_util.h and its users
This makes the naming consistent within string_util and moves is the
Abseil convention.
* Style guide is 2 spaces before end of line "//" comments
* Rename StrPrintF/StringPrintF to StrFormat for absl compatibility.
On Windows the Shlwapi.h file has a macro:
#define StrCat lstrcatA
And benchmark/src/string_util.h defines StrCat and it is renamed to
lstrcatA if we don't undef the macro in Shlwapi.h. This is an innocuous
bug if string_util.h is included after Shlwapi.h, but it is a compile
error if string_util.h is included before Shlwapi.h.
This fixes issue #545.
* Add Solaris support
Define BENCHMARK_OS_SOLARIS for Solaris.
Platform specific implementations added:
* number of CPUs detection
* CPU cycles per second detection
* Thread CPU usage
* Process CPU usage
* Remove the special case for per process CPU time for Solaris, it's the same as the default.
* Print the executable name as part of the context.
A common use case of the library is to run two different
versions of a benchmark to compare them. In my experience
this often means compiling a benchmark twice, renaming
one of the executables, and then running the executables
back-to-back. In this case the name of the executable
is important contextually information. Unfortunately the
benchmark does not report this information.
This patch adds the executable name to the context reported
by the benchmark.
* attempt to fix tests on Windows
* attempt to fix tests on Windows
* Don't include <sys/resource.h> on Fuchsia.
It doesn't support POSIX resource measurement and timing APIs.
Change-Id: Ifab4bac4296575f042c699db1ce5a4f7c2d82893
* Add BENCHMARK_OS_FUCHSIA for Fuchsia
Change-Id: Ic536f9625e413270285fbfd08471dcb6753ddad1
* Improve State packing: put important members on first cache line.
This patch does a few different things to ensure commonly accessed
data is on the first cache line of the `State` object.
First, it moves the `error_occurred_` member to reside after
the `started_` and `finished_` bools, since there was internal
padding there that was unused.
Second, it moves `batch_leftover_` and `max_iterations` further up
in the struct declaration. These variables are used in the calculation
of `iterations()` which users might call within the loop. Therefore
it's more important they exist on the first cache line.
Finally, this patch turns the bool members into bitfields. Although
this shouldn't have much of an effect currently, because padding is
still needed between the last bool and the first size_t, it should
help in future changes that require more "bool like" members.
* Remove bitfield change for now
* Move bools (and their padding) to end of "first cache line" vars.
I think it makes the most sense to move the padding required
following the group of bools to the end of the variables we want
on the first cache line.
This also means that the `total_iterations_` variable, which is the
most accessed, has the same address as the State object.
* Fix static assertion after moving bools
* Support State::KeepRunningBatch().
State::KeepRunning() can take large amounts of time relative to quick
operations (on the order of 1ns, depending on hardware). For such
sensitive operations, it is recommended to run batches of repeated
operations.
This commit simplifies handling of total_iterations_. Rather than
predecrementing such that total_iterations_ == 1 signals that
KeepRunning() should exit, total_iterations_ == 0 now signals the
intention for the benchmark to exit.
* Create better fast path in State::KeepRunningBatch()
* Replace int parameter with size_t to fix signed mismatch warnings
* Ensure benchmark State has been started even on error.
* Simplify KeepRunningBatch()
This patch primarily changes the BENCHMARK_UNREACHABLE()
implementation under MSVC to use __assume(false) instead
of being a NORETURN function, which ironically caused
unreachable code warnings.
Second, since the NOTHROW function attempt generated the
warnings we meant to avoid, it has been replaced with a dummy
null statement.
* Improve CPU Cache info reporting -- Add Windows support.
This patch does a couple of thing regarding CPU Cache reporting.
First, it adds an implementation on Windows. Second it fixes
the JSONReporter to correctly (and actually) output the CPU
configuration information.
And finally, third, it detects and reports the number of
physical CPU's that share the same cache.
* Refactor System information collection.
This patch refactors the system information collection,
and in particular information about the target CPU. The
motivation is to make it easier to access CPU information,
and easier to add new information as need be.
This patch additionally adds information about the cache
sizes of the CPU.
* Address review comments: Clean up integer types.
This commit cleans up the integer types used in ValueUnion to
follow the Google style guide.
Additionally it adds a BENCHMARK_UNREACHABLE macro to assist
in documenting/catching unreachable code paths.
* Rename ValueUnion accessors.
Define BENCHMARK_OS_NETBSD for NetBSD.
Add detection of cpuinfo_cycles_per_second and cpuinfo_num_cpus.
This code shared detection of these properties with FreeBSD.
When stopping a timer, the current time is subtracted
from the start time. However, when the times are identical,
or sufficiently close together, the subtraction can result
in a negative number.
For some reason MinGW is the only platform where this problem
manifests. I suspect it's due to MinGW specific behavior in either
the CPU timing code, floating point model, or printf formatting.
Either way, the fix for MinGW should be correct across all platforms.
This patch improves the performance of the KeepRunning loop in two ways:
(A) it removes the dependency on the max_iterations variable, preventing
it from being loaded every iteration.
(B) it loops to zero, instead of to an upper bound. This allows a single
decrement instruction to be used instead of a arithmetic op followed by a
comparison.
* Drop Stat1, refactor statistics to be user-providable, add median.
My main goal was to add median statistic. Since Stat1
calculated the stats incrementally, and did not store
the values themselves, it is was not possible. Thus,
i have replaced Stat1 with simple std::vector<double>,
containing all the values.
Then, i have refactored current mean/stdev to be a
function that is provided with values vector, and
returns the statistic. While there, it seemed to make
sense to deduplicate the code by storing all the
statistics functions in a map, and then simply iterate
over it. And the interface to add new statistics is
intentionally exposed, so they may be added easily.
The notable change is that Iterations are no longer
displayed as 0 for stdev. Is could be changed, but
i'm not sure how to nicely fit that into the API.
Similarly, this dance about sometimes (for some fields,
for some statistics) dividing by run.iterations, and
then multiplying the calculated stastic back is also
dropped, and if you do the math, i fail to see why
it was needed there in the first place.
Since that was the only use of stat.h, it is removed.
* complexity.h: attempt to fix MSVC build
* Update README.md
* Store statistics to compute in a vector, ensures ordering.
* Add a bit more tests for repetitions.
* Partially address review notes.
* Fix gcc build: drop extra ';'
clang, why didn't you warn me?
* Address review comments.
* double() -> 0.0
* early return
When generating a human-readable number for user counters, we don't
generally expect 1k to be 1024. This is the default due to the more
general purpose string utility.
Fixes#437
* Json reporter: passthrough fp, don't cast it to int; adjust tooling
Json output format is generally meant for further processing
using some automated tools. Thus, it makes sense not to
intentionally limit the precision of the values contained
in the report.
As it can be seen, FormatKV() for doubles, used %.2f format,
which was meant to preserve at least some of the precision.
However, before that function is ever called, the doubles
were already cast to the integer via RoundDouble()...
This is also the case for console reporter, where it makes
sense because the screen space is limited, and this reporter,
however the CSV reporter does output some( decimal digits.
Thus i can only conclude that the loss of the precision
was not really considered, so i have decided to adjust the
code of the json reporter to output the full fp precision.
There can be several reasons why that is the right thing
to do, the bigger the time_unit used, the greater the
precision loss, so i'd say any sort of further processing
(like e.g. tools/compare_bench.py does) is best done
on the values with most precision.
Also, that cast skewed the data away from zero, which
i think may or may not result in false- positives/negatives
in the output of tools/compare_bench.py
* Json reporter: FormatKV(double): address review note
* tools/gbench/report.py: skip benchmarks with different time units
While it may be useful to teach it to operate on the
measurements with different time units, which is now
possible since floats are stored, and not the integers,
but for now at least doing such a sanity-checking
is better than providing misinformation.
Change ThreadCPUUsage to call ProcessCPUUsage if __rtems__ is defined.
RTEMS real time OS doesn't support CLOCK_THREAD_CPUTIME_ID. See
https://github.com/RTEMS/rtems/blob/master/cpukit/posix/src/clockgettime.c#L58-L59
Prior to this change, ThreadCPUUsage would fail when running on RTEMS with:
ERROR: clock_gettime(CLOCK_THREAD_CPUTIME_ID, ...) failed
* Make Benchmark a single header library (but not header-only)
This patch refactors benchmark into a single header, to allow
for slightly easier usage.
The initial reason for the header split was to keep C++ library
components from being included by benchmark_api.h, making that
part of the library STL agnostic. However this has since changed
and there seems to be little reason to separate the reporters from
the rest of the library.
* Fix internal_macros.h
* Remove more references to macros.h
* Add ClearRegisteredBenchmark() function.
Since benchmarks can be registered at runtime using the RegisterBenchmark(...)
functions, it makes sense to have a ClearRegisteredBenchmarks() function too,
that can be used at runtime to clear the currently registered benchmark and
re-register an entirely new set.
This allows users to run a set of registered benchmarks, get the output using
a custom reporter, and then clear and re-register new benchmarks based on the
previous results.
This fixes issue #400, at least partially.
* Remove unused change
Using target_include_directories CMake will implicitly add the the
necessary include paths to targets which link against the benchmark
library. This is useful when the benchmark repo is included as a
subdirectory in another CMake build.
This thing with the pragma ignore was getting out of hand: now
MinGW (and probably GCC) was erroring too. So I chose to move
the definition of IsZero() out of the anonymous namespace into
benchmark.cc.
The problem was that the call to Finish() the user counters was
lost in a big merge. If I had already written the tests for the
user counters, this would probably have been catched earlier.
* fix android compilation
* checking __GLIBCXX__ and __GLIBCPP__ macro in addition to __ANDROID__
* using vsnprintf instead of std::vsnprintf to compile on Android
* removed __GLIBCPP__ check on Android
* StringPrintF instead of std::to_string for Android
When using CPU time to determine the correct number of iterations the
library additionally checks if the benchmark has consumed 5x the minimum
required time according to the wall clock. This prevents benchmarks
with low CPU usage from running for much longer than actually intended.
However when a benchmark uses a manual timer this heuristic isn't helpful
and likely isn't correct since we don't know what the manual timer actually
measures.
This patch removes the above restriction when a benchmark specifies a manual
timer.
* Add Benchmark::Iterations for explicitly specifying the number of iterations to use.
* Document that benchmark::Iterations should not be used to limit benchmark runtimes