* Fix DoNotOptimize() GCC copy overhead (#1340)
The issue is that GCC DoNotOptimize() does a full copy of an argument
if it's not a pointer and it slows down a benchmark. If an argument is big
enough there is a memcpy() call for copying the argument. An argument
object can be a big object so DoNotOptimize() could add sufficient
overhead and affects benchmark results.
The cause is in GCC behavior with asm volatile constraints. Looks like GCC
trying to use r(register) constraint for all cases despite object size.
See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105519
The solution is the split DoNotOptimize() in two cases - value fits
in register and value doesn't fit in register. And use case specific
asm constraint. std::is_trivially_copyable trait is needed because
"+r" constraint doesn't work with non trivial copyable objects.
- Fix requires support C++11 feature std::is_trivially_copyable from GCC
compiler. The feature has been supported since GCC 5
- Fallback for GCC version < 5 still exists but it uses "m" constraint
which means a little bit more overhead in some cases
- Add assembly tests for issued cases
Fixes#1340
* Add supported compiler versions info for assembly tests
- Assembly tests are inherently non-portable. So explicitly add GCC
and Clang versions required for reliable tests passed
- Write a warning message if the current compiler version isn't supported
* Add possibility to ask for libbenchmark version number (#1004)
Add a header which holds the current major, minor, and
patch number of the library. The header is auto generated
by CMake.
* Do not generate unused functions (#1004)
* Add support for version number in bazel (#1004)
* Fix clang format #1004
* Fix more clang format problems (#1004)
* Use git version feature of cmake to determine current lib version
* Rename version_config header to version
* Bake git version into bazel build
* Use same input config header as in cmake for version.h
* Adapt the releasing.md to include versioning in bazel
* add multiple OSes to bazel workflow
* correct indent
* only set copts when they're supported by the OS
* os check should work
* pull out cxx03_test for per-platform stuff
* attempt to fix windows test output
Report all time numbers > 10 digits in scientific notation with
4 decimal places. This is necessary since only 10 digits
are currently reserved for the time columns (Time and CPU).
If exceeding 10 digits the output isnt properly aligned anymore.
* Introduce warmup phase to BenchmarkRunner (#1130)
In order to account for caching effects in user
benchmarks introduce a new command line option
"--benchmark_min_warmup_time"
which allows to specify an amount of time for
which the benchmark should be run before results
are meaningful.
* Adapt review suggestions regarding introduction of warmup phase (#1130)
* Fix BM_CHECK call in MinWarmUpTime (#1130)
* Fix comment on requirements of MinWarmUpTime (#1130)
* Add basic description of warmup phase mechanism to user guide (#1130)
* Add option to get the verbosity provided by commandline flag -v (#1330)
* replace assert with test failure
asserts are stripped out in non debug builds, and we run tests in non-debug CI bots.
* clang-format my own tweak
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
This commit adds a small section on how to install and build Python
bindings wheels to the docs, as well as a link to it from the main readme.
Notes were added that clearly state availability of Python wheels based
on Python version and OS/architecture combinations.
For the guide to build a wheel from source, the best practice of
creating a virtual environment and activating it before build was
detailed. Also, a note on the required installation of Bazel was added,
with a link to the official docs on installation.
* Filter out benchmarks that start with "DISABLED_"
This could be slightly more elegant, in that the registration and the
benchmark definition names have to change. Ideally, we'd still register
without the DISABLED_ prefix and it would all "just work".
Fixes#1365
* add some documentation
Previously, with the unrolled job matrix, all jobs had to be listed individually in the `needs` section of the PyPI upload job. But as the wheel build job was reimplemented as a job matrix now, with a
single build job name `build_wheels`, we need to adjust the name in the PyPI upload job as well here to avoid errors.
This commit adds a `bazel shutdown` command to the setuptools BazelExtension. This has the effect that wheel builds shut down the Bazel server and terminate gracefully after the build, something
that was previously an issue on Windows builds.
Since the windows-specific `--no-clean` flag option to `pip wheel` becomes unnecessary due to this change, this change has the side-effect that GitHub Actions wheel builds via `cibuildwheel` can now
be written as a compact job matrix again, which leads to a lot of deduplicated code in the corresponding workflow file.
Lastly, some GitHub-provided actions (checkout, setup-python, upload/download-artifact) were bumped to the latest v3 version.
If someone or something ever needs the dynamic library as a Bazel build
artifact, we can figure that out for them then, but right now, there is
no strong reason to be wrangling various `export.h`-controlling macros.
Fixes#1372.
This commit fixes the previous breakage in Python wheel builds for Windows by adding a `local_defines` field to the `cc_binary` generated in the process of the Python bindings builds. This define is being
picked up by the auto-generated export header `benchmark_export.h`, unsetting the benchmark export macro.
Furthermore, the `linkshared` and `linkstatic` attributes are passed booleans now instead of ints, making the command more directly interpretable to the human reader.
The fix was suggested by @junyer in the corresponding GitHub issue thread https://github.com/google/benchmark/issues/1367 - thank you for the suggestion!
This commit adds a job running after the wheel building job responsible for uploading the built wheels to PyPI.
The job only runs on successful completion of all build jobs, and uploads to PyPI using a secret added to the Google Benchmark repo (TBD).
Also, the setup-python action has been bumped to the latest version v3.
* Add SetBenchmarkFilter() to set --benchmark_filter flag value in user code.
Use case: Provide an API to set this flag indepedence of the flag's implementation (ie., absl flag vs benchmark's flag facility)
* add test
* added notes on Initialize()
This commit adds the two fields `long_description` and `long_description_content_type` to `setup.py`. These can be used for proper project presentation on the PyPI project page, which is currently a placeholder.
* Add option to set the default time unit globally
This commit introduces the `--benchmark_time_unit={ns|us|ms|s}` command line argument. The argument only affects benchmarks where the time unit is not set explicitly.
* Update AUTHORS and CONTRIBUTORS
* Test `SetDefaultTimeUnit`
* clang format
* Use `GetDefaultTimeUnit()` for initializing `TimeUnit` variables
* Review fixes
* Export functions
* Add comment
* Make generate_export_header.bzl work for Windows.
While I'm here, bring the generated code slightly closer to what CMake
would generate nowadays.
Fixes#1351.
* Fix define.
* Fix export_import_condition.
* Fix guard.
* introduce the possibility to customize the help printer function
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
* fixed naming convertion, and introduce the option function in the init method
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
* remove the macros to inject the helper function
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
* remove the default implementation, and introduce the nullprt
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
* Expose default display reporter creation in public API
this is useful when a custom reporter wants to fall back on the default
display reporter, but doesn't necessarily have access to the benchmark
library flag configuration.
* Make use of unique_ptr in the random interleaving test.
* clang-format
* The parameterized tests check both floating point and integral types. We might as well use types that avoid truncation warnings across the platforms
* static_cast version of how to avoid truncation warnings in basic_test
Co-authored-by: Staffan Tjernstrom <staffantj@users.noreply.github.com>
This commit contains a fix for macOS ARM64 wheel buils in Google Benchmark's wheel building CI.
Previously, while `cibuildwheel` itself properly identified the need for cross-compilations and produced valid ARM platform wheels, the included shared library containing the Python bindings
built by `bazel` was built for x86, resulting in immediate errors upon import.
To fix this, logic was added to the setup.py file that adds the "--cpu=darwin_arm64" and "--macos_cpus=arm64" switches to the `bazel build` command if
1) The current system platform is macOS Darwin running on the x86_64 architecture, and
2) The ARCHFLAGS environment variable, set by wheel build systems like conda and cibuildwheel, contains the tag "arm64".
This way, bazel correctly sets the target CPU to ARM64, and produces functional wheels for the macOS ARM line of CPUs.
This patch fixes#1306, by reducing the pinned instances of
PerfCounters.
The issue is caused by creating multiple pinned events in the
same thread, doing so results in the Snapshot(PerfCounterValues* values)
failing, and that's now discoverable.
Creating multile pinned events is an unsupported behavior currently.
The error would be detected at read() time, not
perf_event_open() / iotcl() time.
The unsupported benavior above is confirmed by Stephane Eranian @seranian,
and he also pointed the dectection method.
Finished this patch under the guidance of Mircea Trofin @mtrofin.