Histogram
Fast multi-dimensional histogram with convenient interface for C++14 and Python
Branch | Linux [1] and OSX [2] | Windows [3] | Coverage |
---|---|---|---|
master | |||
develop |
- gcc-5.5.0, clang-5.0.0
- Xcode 9.4
- Visual Studio 15 2017
This C++14
header-only open-source library provides a state-of-the-art multi-dimensional histogram class for the professional statistician and everyone who needs to counts things. Actually, this histogram can do more than counting. It can be equipped with arbitrary accumulators to compute means, medians, and whatever you fancy in each cell. Several parallelization options are provided. Check out the full documentation. Python bindings to this library are available elsewhere.
The histogram is very customisable through a templated modular design, but the default options were carefully chosen so that most users don't need to customize anything. It is easy to use for the casual user, but does not restrict the power-user. In the standard configuration, this library offers a unique safety guarantee not found elsewhere: bin counts cannot overflow or be capped. While being safe to use, the library also has a convenient interface, is memory conserving, and faster than other libraries (see benchmarks).
The histogram class can be configured in several variants from fully static to fully dynamic. Static variants provides more preformance, at the cost of runtime flexibility and potentially larger executables. Dynamic variants are slower, but fully configurable at run-time and may produce smaller executables. A dynamic variant is used in the Python bindings to this library.
This project was developed for inclusion in Boost and passed Boost review in September 2018. The plan is to have a first official Boost-release in April 2019 with the upcoming version 1.70. Of course, you can use it already now. The source code is licensed under the Boost Software License.
Check out the full documentation. Highlights are given below.
Features
- Extremely customisable multi-dimensional histogram
- Simple, convenient, STL and Boost-compatible interface
- Static and dynamic implementations under a common interface
- Counters with high dynamic range, cannot overflow or be capped (1)
- Better performance than other libraries (see benchmarks for details)
- Efficient use of memory (1)
- Support for custom axis types: define how input values should map to indices
- Support for under-/overflow bins (can be disabled to reduce memory consumption)
- Support for weighted increments
- Support for custom accumulators in each bin (2)
- Support for variance estimates (3)
- Support for completely stack-based histograms
- Support for adding and scaling histograms
- Support for custom allocators
- Support for type-safe histograms (4)
- Optional serialization based on Boost.Serialization
- In the standard configuration, if you don't use weighted increments. The counter capacity is increased dynamically as the cell counts grow. When even the largest plain integral type would overflow, the storage switches to a Boost.Multiprecision integer, which is only limited by available memory.
- The histogram can be configured to hold an arbitrary accumulator in each cell instead of a simple counter. Extra values can be passed to the histogram, for example, to compute the mean and variance of values which fall into the same cell.
- Variance estimates are useful when histograms are to be compared quantitatively and if a statistical model is fitted to the cell-counts.
- Builtin axis types can configured to only accept dimensional quantities, like those from Boost.Units.
Dependencies
- Boost >= 1.66 header-only installation
- Optional: CMake >= 3.5 Boost.Serialization
Build instructions
If you don't want to run the tests, there is nothing to build. Just copy the content of the include folder to a place where your project can find it.
The tests can be build with b2
from the Boost project or cmake
. If you are not a Boost developer, use cmake
.
git clone https://github.com/HDembinski/histogram.git
mkdir build && cd build
cmake ..
make
To run the tests, do make test
or ctest -v
for more output.
Code example
The following stripped-down example was taken from the Getting started section in the documentation. Have a look into the docs to see the full version with comments and more examples.
Example: Fill a 1d-histogram
#include <boost/histogram.hpp>
#include <boost/format.hpp> // used here for printing
#include <functional> // for std::ref
int main() {
namespace bh = boost::histogram;
auto h = bh::make_histogram(
bh::axis::regular<>(6, -1.0, 2.0, "x")
);
// fill histogram
auto data = { -0.4, 1.1, 0.3, 1.7 };
std::for_each(data.begin(), data.end(), std::ref(h));
// iterate over bins
for (auto x : bh::indexed(h)) {
std::cout << boost::format("bin %2i [%4.1f, %4.1f): %i\n")
% x[0] % x.bin(0).lower() % x.bin(0).upper() % *x;
}
std::cout << std::flush;
/* program output:
bin -1 [-inf, -1.0): 1
bin 0 [-1.0, -0.5): 1
bin 1 [-0.5, -0.0): 1
bin 2 [-0.0, 0.5): 2
bin 3 [ 0.5, 1.0): 0
bin 4 [ 1.0, 1.5): 1
bin 5 [ 1.5, 2.0): 1
bin 6 [ 2.0, inf): 2
*/
}
Benchmarks
Thanks to meta-programming and dynamic memory management, this library is not only safer, more flexible and convenient to use, but also faster than the competition. In the plot below, its speed is compared to classes from the GNU Scientific Library, the ROOT framework from CERN, and to the histogram functions in Numpy. The orange to red items are different compile-time configurations of the histogram in this library. More details on the benchmark are given in the documentation
What users say
John Buonagurio | Manager at Exponent®
"I just wanted to say 'thanks' for your awesome Histogram library. I'm working on a software package for processing meteorology data and I'm using it to generate wind roses with the help of Qt and QwtPolar. Looks like you thought of just about everything here – the circular axis type was practically designed for this application, everything 'just worked'."
Rationale
There is a lack of a widely-used free histogram class in C++. While it is easy to write a one-dimensional histogram, writing a general multi-dimensional histogram is not trivial. In high-energy physics, the ROOT framework from CERN is widely used, but it comes with a large dependency. This histogram class is designed to be more convenient, flexible, and faster than the equivalent ROOT histograms. It is easy to integrate in your project if you already use Boost. The library comes in a C++14 design which follows the STL and Boost styles, and the general advice given by well-respected C++ experts (Meyers, Sutter and Alexandrescu, Stroustrup and others).
Read more about the design choices in the documentation
State of project
The histogram is nearly feature-complete. More than 500 individual tests make sure that the implementation works as expected. Full documentation is available. User feedback is appreciated!
The library was reviewed in September 2018 by the Boost Community under review manager Mateusz Loskot. It was conditionally accepted with requests to improve the interface and documentation. Current development is focusing on implementing these requests. Code-breaking changes of the interface are currently happening on the develop branch. If you want to use the library in production code, please use the latest release. After the library is released as part of Boost, the interface will be kept stable. The first release is planned for in April 2019 with Boost-1.70.