They are basically proto-version of custom user counters.
It does not seem that they do anything that custom user counters
don't do. And having two similar entities is not good for generalization.
Migration plan:
* ```
SetItemsProcessed(<val>)
=>
state.counters.insert({
{"<Name>", benchmark::Counter(<val>, benchmark::Counter::kIsRate)},
...
});
```
* ```
SetBytesProcessed(<val>)
=>
state.counters.insert({
{"<Name>", benchmark::Counter(<val>, benchmark::Counter::kIsRate,
benchmark::Counter::OneK::kIs1024)},
...
});
```
* ```
<Name>_processed()
=>
state.counters["<Name>"]
```
One thing the custom user counters miss is better support
for units of measurement.
Refs. https://github.com/google/benchmark/issues/627
* Counter(): add 'one thousand' param.
Needed for https://github.com/google/benchmark/pull/654
Custom user counters are quite custom. It is not guaranteed
that the user *always* expects for these to have 1k == 1000.
If the counter represents bytes/memory/etc, 1k should be 1024.
Some bikeshedding points:
1. Is this sufficient, or do we really want to go full on
into custom types with names?
I think just the '1000' is sufficient for now.
2. Should there be a helper benchmark::Counter::Counter{1000,1024}()
static 'constructor' functions, since these two, by far,
will be the most used?
3. In the future, we should be somehow encoding this info into JSON.
* Counter(): use std::pair<> to represent 'one thousand'
* Counter(): just use a new enum with two values 1000 vs 1024.
Simpler is better. If someone comes up with a real reason
to need something more advanced, it can be added later on.
* Counter: just store the 1000 or 1024 in the One_K values directly
* Counter: s/One_K/OneK/
There are two destinations:
* display (console, terminal) and
* file.
And each of the destinations can be poplulated with one of the reporters:
* console - human-friendly table-like display
* json
* csv (deprecated)
So using the name console_reporter is confusing.
Is it talking about the console reporter in the sense of
table-like reporter, or in the sense of display destination?
This is *only* exposed in the JSON. Not in CSV, which is deprecated.
This *only* supposed to track these two states.
An additional field could later track which aggregate this is,
specifically (statistic name, rms, bigo, ...)
The motivation is that we already have ReportAggregatesOnly,
but it affects the entire reports, both the display,
and the reporters (json files), which isn't ideal.
It would be very useful to have a 'display aggregates only' option,
both in the library's console reporter, and the python tooling,
This will be especially needed for the 'store separate iterations'.
This only specifically represents handling of reporting of aggregates.
Not of anything else. Making it more specific makes the name less generic.
This is an issue because i want to add "iteration report mode",
so the naming would be conflicting.
* Remove redundant default which causes failures
* Fix old GCC warnings caused by poor analysis
* Use __builtin_unreachable
* Use BENCHMARK_UNREACHABLE()
* Pull __has_builtin to benchmark.h too
* Also move compiler identification macro to main header
* Move custom compiler identification macro back
Inspired by these [two](a1ebe07bea) [bugs](0891555be5) in my code due to the lack of those i have found fixed in my code:
* `kIsIterationInvariant` - `* state.iterations()`
The value is constant for every iteration, and needs to be **multiplied** by the iteration count.
* `kAvgIterations` - `/ state.iterations()`
The is global over all the iterations, and needs to be **divided** by the iteration count.
They play nice with `kIsRate`:
* `kIsIterationInvariantRate`
* `kAvgIterationsRate`.
I'm not sure how meaningful they are when combined with `kAvgThreads`.
I guess the `kIsThreadInvariant` can be added, too, for symmetry with `kAvgThreads`.
As @dominichamon and I have discussed, the current reporter interface
is poor at best. And something should be done to fix it.
I strongly suspect such a fix will require an entire reimagining
of the API, and therefore breaking backwards compatibility fully.
For that reason we should start deprecating and removing parts
that we don't intend to replace. One of these parts, I argue,
is the CSVReporter. I propose that the new reporter interface
should choose a single output format (JSON) and traffic entirely
in that. If somebody really wanted to replace the functionality
of the CSVReporter they would do so as an external tool which
transforms the JSON.
For these reasons I propose deprecating the CSVReporter.
* Ensure 64-bit truncation doesn't happen for complexity results
* One more complexity_n 64-bit fix
* Missed another vector of int
* Piping through the int64_t
* Allow AddRange to work with int64_t.
Fixes#516
Also, tweak how we manage per-test build needs, and create a standard
_gtest suffix for googletest to differentiate from non-googletest tests.
I also ran clang-format on the files that I changed (but not the
benchmark include or main src as they have too many clang-format
issues).
* Add benchmark_gtest to cmake
* Set(Items|Bytes)Processed now take int64_t
* Add tests to verify assembler output -- Fix DoNotOptimize.
For things like `DoNotOptimize`, `ClobberMemory`, and even `KeepRunning()`,
it is important exactly what assembly they generate. However, we currently
have no way to test this. Instead it must be manually validated every
time a change occurs -- including a change in compiler version.
This patch attempts to introduce a way to test the assembled output automatically.
It's mirrors how LLVM verifies compiler output, and it uses LLVM FileCheck to run
the tests in a similar way.
The tests function by generating the assembly for a test in CMake, and then
using FileCheck to verify the // CHECK lines in the source file are found
in the generated assembly.
Currently, the tests only run on 64-bit x86 systems under GCC and Clang,
and when FileCheck is found on the system.
Additionally, this patch tries to improve the code gen from DoNotOptimize.
This should probably be a separate change, but I needed something to test.
* Disable assembly tests on Bazel for now
* Link FIXME to github issue
* Fix Tests on OS X
* fix strip_asm.py to work on both Linux and OS X like targets
* Rename StringXxx to StrXxx in string_util.h and its users
This makes the naming consistent within string_util and moves is the
Abseil convention.
* Style guide is 2 spaces before end of line "//" comments
* Rename StrPrintF/StringPrintF to StrFormat for absl compatibility.
* Print the executable name as part of the context.
A common use case of the library is to run two different
versions of a benchmark to compare them. In my experience
this often means compiling a benchmark twice, renaming
one of the executables, and then running the executables
back-to-back. In this case the name of the executable
is important contextually information. Unfortunately the
benchmark does not report this information.
This patch adds the executable name to the context reported
by the benchmark.
* attempt to fix tests on Windows
* attempt to fix tests on Windows
Due to ADL lookup performed on the begin and end functions
of `for (auto _ : State)`, std::iterator_traits may get
incidentally instantiated. This patch ensures the library
can tolerate that.
* Improve State packing: put important members on first cache line.
This patch does a few different things to ensure commonly accessed
data is on the first cache line of the `State` object.
First, it moves the `error_occurred_` member to reside after
the `started_` and `finished_` bools, since there was internal
padding there that was unused.
Second, it moves `batch_leftover_` and `max_iterations` further up
in the struct declaration. These variables are used in the calculation
of `iterations()` which users might call within the loop. Therefore
it's more important they exist on the first cache line.
Finally, this patch turns the bool members into bitfields. Although
this shouldn't have much of an effect currently, because padding is
still needed between the last bool and the first size_t, it should
help in future changes that require more "bool like" members.
* Remove bitfield change for now
* Move bools (and their padding) to end of "first cache line" vars.
I think it makes the most sense to move the padding required
following the group of bools to the end of the variables we want
on the first cache line.
This also means that the `total_iterations_` variable, which is the
most accessed, has the same address as the State object.
* Fix static assertion after moving bools
* Support State::KeepRunningBatch().
State::KeepRunning() can take large amounts of time relative to quick
operations (on the order of 1ns, depending on hardware). For such
sensitive operations, it is recommended to run batches of repeated
operations.
This commit simplifies handling of total_iterations_. Rather than
predecrementing such that total_iterations_ == 1 signals that
KeepRunning() should exit, total_iterations_ == 0 now signals the
intention for the benchmark to exit.
* Create better fast path in State::KeepRunningBatch()
* Replace int parameter with size_t to fix signed mismatch warnings
* Ensure benchmark State has been started even on error.
* Simplify KeepRunningBatch()
* Implement KeepRunning() in terms of KeepRunningBatch().
* Improve codegen by helping the compiler undestand dead code.
* Dummy commit for build bots' benefit.
* Support State::KeepRunningBatch().
State::KeepRunning() can take large amounts of time relative to quick
operations (on the order of 1ns, depending on hardware). For such
sensitive operations, it is recommended to run batches of repeated
operations.
This commit simplifies handling of total_iterations_. Rather than
predecrementing such that total_iterations_ == 1 signals that
KeepRunning() should exit, total_iterations_ == 0 now signals the
intention for the benchmark to exit.
* Create better fast path in State::KeepRunningBatch()
* Replace int parameter with size_t to fix signed mismatch warnings
* Ensure benchmark State has been started even on error.
* Simplify KeepRunningBatch()
* Improve CPU Cache info reporting -- Add Windows support.
This patch does a couple of thing regarding CPU Cache reporting.
First, it adds an implementation on Windows. Second it fixes
the JSONReporter to correctly (and actually) output the CPU
configuration information.
And finally, third, it detects and reports the number of
physical CPU's that share the same cache.
* Refactor System information collection.
This patch refactors the system information collection,
and in particular information about the target CPU. The
motivation is to make it easier to access CPU information,
and easier to add new information as need be.
This patch additionally adds information about the cache
sizes of the CPU.
* Address review comments: Clean up integer types.
This commit cleans up the integer types used in ValueUnion to
follow the Google style guide.
Additionally it adds a BENCHMARK_UNREACHABLE macro to assist
in documenting/catching unreachable code paths.
* Rename ValueUnion accessors.
* Fix BM_SetInsert example
Move declaration of `std::set<int> data` outside the timing loop, so that the
destructor is not timed.
* Speed up BM_SetInsert test
Since the time taken to ConstructRandomSet() is so large compared to the time
to insert one element, but only the latter is used to determine number of
iterations, this benchmark now takes an extremely long time to run in
benchmark_test.
Speed it up two ways:
- Increase the Ranges() parameters
- Cache ConstructRandomSet() result (it's not random anyway), and do only
O(N) copy every iteration
* Fix same issue in BM_MapLookup test
* Make BM_SetInsert test consistent with README
- Use the same Ranges everywhere, but increase the 2nd range
- Change order of Args() calls in README to more closely match the result of Ranges
- Don't cache ConstructRandomSet, since it doesn't make sense in README
- Get a smaller optimization inside it, by givint a hint to insert()
Recently the library added a new ranged-for variant of the KeepRunning
loop that is much faster. For this reason it should be preferred in all
new code.
Because a library, its documentation, and its tests should all embody
the best practices of using the library, this patch changes all but a
few usages of KeepRunning() into for (auto _ : state).
The remaining usages in the tests and documentation persist only
to document and test behavior that is different between the two formulations.
Also note that because the range-for loop requires C++11, the KeepRunning
variant has not been deprecated at this time.
This patch improves the performance of the KeepRunning loop in two ways:
(A) it removes the dependency on the max_iterations variable, preventing
it from being loaded every iteration.
(B) it loops to zero, instead of to an upper bound. This allows a single
decrement instruction to be used instead of a arithmetic op followed by a
comparison.
* Add C++11 Ranged For loop alternative to KeepRunning
As pointed out by @astrelni and @dominichamon, the KeepRunning
loop requires a bunch of memory loads and stores every iterations,
which affects the measurements.
The main reason for these additional loads and stores is that the
State object is passed in by reference, making its contents externally
visible memory, and the compiler doesn't know it hasn't been changed
by non-visible code.
It's also possible the large size of the State struct is hindering
optimizations.
This patch allows the `State` object to be iterated over using
a range-based for loop. Example:
void BM_Foo(benchmark::State& state) {
for (auto _ : state) {
[...]
}
}
This formulation is much more efficient, because the variable counting
the loop index is stored in the iterator produced by `State::begin()`,
which itself is stored in function-local memory and therefore not accessible
by code outside of the function. Therefore the compiler knows the iterator
hasn't been changed every iteration.
This initial patch and idea was from Alex Strelnikov.
* Fix null pointer initialization in C++03
* Always use inline asm DoNotOptimize with clang.
clang-cl masquerades as MSVC but not GCC, so it was using the
MSVC-compatible definitions of DoNotOptimize and ClobberMemory.
Presumably, it's better in general to use the targeted assembly for
this functionality (the codegen is different), but the specific issue
is that clang-cl deprecates the usage of _ReadWriteBarrier, and this
gets rid of that warning.
* triggering another AppVeyor run
* Fix#444 - Use BENCHMARK_HAS_CXX11 over __cplusplus.
MSVC incorrectly defines __cplusplus to report C++03, despite the compiler
actually providing C++11 or greater. Therefore we have to detect C++11 differently
for MSVC. This patch uses `_MSVC_LANG` which has been defined since
Visual Studio 2015 Update 3; which should be sufficient for detecting C++11.
Secondly this patch changes over most usages of __cplusplus >= 201103L to
check BENCHMARK_HAS_CXX11 instead.
* remove redunant comment
* Drop Stat1, refactor statistics to be user-providable, add median.
My main goal was to add median statistic. Since Stat1
calculated the stats incrementally, and did not store
the values themselves, it is was not possible. Thus,
i have replaced Stat1 with simple std::vector<double>,
containing all the values.
Then, i have refactored current mean/stdev to be a
function that is provided with values vector, and
returns the statistic. While there, it seemed to make
sense to deduplicate the code by storing all the
statistics functions in a map, and then simply iterate
over it. And the interface to add new statistics is
intentionally exposed, so they may be added easily.
The notable change is that Iterations are no longer
displayed as 0 for stdev. Is could be changed, but
i'm not sure how to nicely fit that into the API.
Similarly, this dance about sometimes (for some fields,
for some statistics) dividing by run.iterations, and
then multiplying the calculated stastic back is also
dropped, and if you do the math, i fail to see why
it was needed there in the first place.
Since that was the only use of stat.h, it is removed.
* complexity.h: attempt to fix MSVC build
* Update README.md
* Store statistics to compute in a vector, ensures ordering.
* Add a bit more tests for repetitions.
* Partially address review notes.
* Fix gcc build: drop extra ';'
clang, why didn't you warn me?
* Address review comments.
* double() -> 0.0
* early return
* Make Benchmark a single header library (but not header-only)
This patch refactors benchmark into a single header, to allow
for slightly easier usage.
The initial reason for the header split was to keep C++ library
components from being included by benchmark_api.h, making that
part of the library STL agnostic. However this has since changed
and there seems to be little reason to separate the reporters from
the rest of the library.
* Fix internal_macros.h
* Remove more references to macros.h
* Add ClearRegisteredBenchmark() function.
Since benchmarks can be registered at runtime using the RegisterBenchmark(...)
functions, it makes sense to have a ClearRegisteredBenchmarks() function too,
that can be used at runtime to clear the currently registered benchmark and
re-register an entirely new set.
This allows users to run a set of registered benchmarks, get the output using
a custom reporter, and then clear and re-register new benchmarks based on the
previous results.
This fixes issue #400, at least partially.
* Remove unused change
* Fix#342: DoNotOptimize causes compile errors on older GCC versions.
DoNotOptimize uses inline assembly contraints to tell
the compiler what the type of the input variable. The 'g'
operand allows the input to be any register, memory, or
immediate integer operand. However this constraint seems
to be too weak on older GCC versions, and certain inputs
will cause compile errors.
This patch changes the constraint to 'X', which is documented
as "any operand whatsoever is allowed". This appears to fix
the issues with older GCC versions.
However Clang doesn't seem to like "X", and will attempt
to put the input into a register even when it can't/shouldn't;
causing a compile error. However using "g" seems to work like
"X" with GCC, so for this reason Clang still uses "g".
* Try alternative formulation to placate GCC
* Add Benchmark::Iterations for explicitly specifying the number of iterations to use.
* Document that benchmark::Iterations should not be used to limit benchmark runtimes
I recently learned Windows provides a function called _ReadWriteBarrier
which is literally ClobberMemory under a different name. This patch
uses it to implement ClobberMemory under MSVC.
Previously benchmark_api.h wasn't allowed to include standard library
headers. For this reason SetLabel had a hack to accept std::string
without including <string>. The hack worked by attempting to detect
the injected class name `basic_string`. However Clang has changed
it's behavior regarding injected class names so this hack no longer
works.
This patch removes the hack and replaces it with a function that
actually names std::string. However we still cannot pass std::string
across the dylib boundary because of libstdc++'s dual C++11 ABI.
* Added user counters, and move use of bytes_processed and items_processed to user counter logic.
Each counter is a string-value pair. The counters were
made available through the State class. Two helper virtual
methods were added to the Fixture class to allow convenient
initialization and termination of the counters: InitState()
and TerminateState(). The reporting of the counters is buggy
and is still a work in progress, to be completed in the next commits.
* fix bad removal of BenchmarkCounters code during the merge
* add myself to AUTHORS/CONTRIBUTORS
* fix printing to std::cout in csv_reporter
* bytes_per_second and items_per_second are now in the UserCounters class
* add user counters to json reporter
* moving bytes_per_second and items_per_second to their old state
* console reporter dealing ok with user counters.
* update unit tests for user counters
* CSVReporter now prints user counters too.
* cleanup user counters
* reverted changes to cmake files which should have gone into later commits
* fixture_test: fix gcc 4.6 compilation
* remove ctor with default argument
see https://github.com/google/benchmark/pull/262#discussion_r72298055
* use (auto-defined) BENCHMARK_HAS_CXX11 instead of BENCHMARK_INITLIST.
https://github.com/google/benchmark/pull/262#discussion_r72298310
* leanify counters API
Discussions:
API complexity: https://github.com/google/benchmark/pull/262#discussion_r72298731
remove std::string dependency (WIP): https://github.com/google/benchmark/pull/262#discussion_r72298142
spacing & alignment: https://github.com/google/benchmark/pull/262#discussion_r72298422
* remove std::string dependency on public API - changed counter name storage to char*
* Counter ctor: use overloads instead of default arguments
discussion:
https://github.com/google/benchmark/pull/262#discussion_r72298055
* Use raw pointers to remove dependency on std::vector from public API .
For more info, see discussion at https://github.com/google/benchmark/pull/262#discussion_r72319678 .
* Move counter implementation from benchmark.cc to counter.cc.
See discussion: https://github.com/google/benchmark/pull/262#discussion_r72298980 .
* Remove unused (commented-out) code.
* Moved thread counters to ThreadStats.
* Counters: fixed copy and move constructors.
* Counter: use an inplace buffer for small names.
* benchmark_test: move counters test out of CXX11 preprocessor conditional.
* Counter: fix VS2013 compilation error in char[] initialization.
* Fix typo.
* Expose counters from State.
See discussion: https://github.com/google/benchmark/pull/262#issuecomment-237156951
* Changed counters interface to map-like.
* Fix printing of user counters in ConsoleReporter.
* Applied clang-format to counter.cc and console_reporter.cc.
Command was `clang-format -style=Google -i counter.cc console_reporter.cc`
I also applied to all other files, but the changes were very
far-reaching so I rolled those back.
* Rename Counter::Flags_e to Counter::Flags
* Fix use of reserved names in Counter and BenchmarkCounters.
* Counter: Fix move ctor bug + change order of members.
* Fixture: remove tentative methods InitState() and TerminateState().
* Update fixture_test to the new Fixture interface.
* BenchmarkCounters: fixed a bug in the move ctor. Remove call to CHECK_LT().
CHECK_LT() was making the size_t lookup take ~double the time of a string lookup!
* BenchmarkCounters: add option to not print zero counters (defaults to false).
* Add test to compare counter storage and access with std::map.
* README: clarify cost of counter access modes.
* move counter access test to an own test.
* BenchmarkCounters: add move Insert()
* Counters access test: add accelerated lookup by name.
* Fix old range syntax.
* Fix missing include of cstdio
* Fix Visual Studio warning
* VS2013 and lower: fix use of snprintf()
* VS2013: fix use of char[] as a member of std::pair<>.
* change counter storage to std::map
* Remove skipZeroCounters logic
* Fix VS compilation error.
* Implemented request changes to PR #262.
* PR #262: More requested changes.
* README: cleanup counter text.
* PR #262: remove clang-format changes for preexisting code
* Complexity+Counters: fix counter flags which were being ignored.
* Document all Counter::Flag members
* fixed loss of counter values
* ConsoleReporter: remove tabular printing of user counters.
* ConsoleReporter: header printing should not be contingent on user counter names.
* Minor white space and alignment fixes.
* cxx03_test + counters: reuse the BM_empty() function.
* user counters: add note to README on how counters are gathered across threads
* Implement cycleclock::Now for PNaCl
* Make cycleclock::Now compatible with NaCl/ARM
* Support Emscripten (Asm.js, WebAssembly)
* Rearrange #ifs from to handle specific cases first
* DoNotOptimize without inline asm for Emscripten & PNaCl