[#concurrent] = Concurrent Containers :idprefix: concurrent_ Boost.Unordered provides `boost::concurrent_node_set`, `boost::concurrent_node_map`, `boost::concurrent_flat_set` and `boost::concurrent_flat_map`, hash tables that allow concurrent write/read access from different threads without having to implement any synchronzation mechanism on the user's side. [source,c++] ---- std::vector input; boost::concurrent_flat_map m; ... // process input in parallel const int num_threads = 8; std::vector threads; std::size_t chunk = input.size() / num_threads; // how many elements per thread for (int i = 0; i < num_threads; ++i) { threads.emplace_back([&,i] { // calculate the portion of input this thread takes care of std::size_t start = i * chunk; std::size_t end = (i == num_threads - 1)? input.size(): (i + 1) * chunk; for (std::size_t n = start; n < end; ++n) { m.emplace(input[n], calculation(input[n])); } }); } ---- In the example above, threads access `m` without synchronization, just as we'd do in a single-threaded scenario. In an ideal setting, if a given workload is distributed among _N_ threads, execution is _N_ times faster than with one thread —this limit is never attained in practice due to synchronization overheads and _contention_ (one thread waiting for another to leave a locked portion of the map), but Boost.Unordered concurrent containers are designed to perform with very little overhead and typically achieve _linear scaling_ (that is, performance is proportional to the number of threads up to the number of logical cores in the CPU). == Visitation-based API The first thing a new user of Boost.Unordered concurrent containers will notice is that these classes _do not provide iterators_ (which makes them technically not https://en.cppreference.com/w/cpp/named_req/Container[Containers^] in the C++ standard sense). The reason for this is that iterators are inherently thread-unsafe. Consider this hypothetical code: [source,c++] ---- auto it = m.find(k); // A: get an iterator pointing to the element with key k if (it != m.end() ) { some_function(*it); // B: use the value of the element } ---- In a multithreaded scenario, the iterator `it` may be invalid at point B if some other thread issues an `m.erase(k)` operation between A and B. There are designs that can remedy this by making iterators lock the element they point to, but this approach lends itself to high contention and can easily produce deadlocks in a program. `operator[]` has similar concurrency issues, and is not provided by `boost::concurrent_flat_map`/`boost::concurrent_node_map` either. Instead, element access is done through so-called _visitation functions_: [source,c++] ---- m.visit(k, [](const auto& x) { // x is the element with key k (if it exists) some_function(x); // use it }); ---- The visitation function passed by the user (in this case, a lambda function) is executed internally by Boost.Unordered in a thread-safe manner, so it can access the element without worrying about other threads interfering in the process. On the other hand, a visitation function can _not_ access the container itself: [source,c++] ---- m.visit(k, [&](const auto& x) { some_function(x, m.size()); // forbidden: m can't be accessed inside visitation }); ---- Access to a different container is allowed, though: [source,c++] ---- m.visit(k, [&](const auto& x) { if (some_function(x)) { m2.insert(x); // OK, m2 is a different boost::concurrent_flat_map } }); ---- But, in general, visitation functions should be as lightweight as possible to reduce contention and increase parallelization. In some cases, moving heavy work outside of visitation may be beneficial: [source,c++] ---- std::optional o; bool found = m.visit(k, [&](const auto& x) { o = x; }); if (found) { some_heavy_duty_function(*o); } ---- Visitation is prominent in the API provided by concurrent containers, and many classical operations have visitation-enabled variations: [source,c++] ---- m.insert_or_visit(x, [](auto& y) { // if insertion failed because of an equivalent element y, // do something with it, for instance: ++y.second; // increment the mapped part of the element }); ---- Note that in this last example the visitation function could actually _modify_ the element: as a general rule, operations on a concurrent map `m` will grant visitation functions const/non-const access to the element depending on whether `m` is const/non-const. Const access can be always be explicitly requested by using `cvisit` overloads (for instance, `insert_or_cvisit`) and may result in higher parallelization. For concurrent sets, on the other hand, visitation is always const access. Although expected to be used much less frequently, concurrent containers also provide insertion operations where an element can be visited right after element creation (in addition to the usual visitation when an equivalent element already exists): [source,c++] ---- m.insert_and_cvisit(x, [](const auto& y) { std::cout<< "(" << y.first << ", " << y.second <<") inserted\n"; }, [](const auto& y) { std::cout<< "(" << y.first << ", " << y.second << ") already exists\n"; }); ---- Consult the references of `xref:reference/concurrent_node_set.adoc#concurrent_node_set[boost::concurrent_node_set]`, `xref:reference/concurrent_node_map.adoc#concurrent_node_map[boost::concurrent_node_map]`, `xref:reference/concurrent_flat_set.adoc#concurrent_flat_set[boost::concurrent_flat_set]` and `xref:reference/concurrent_flat_map.adoc#concurrent_flat_map[boost::concurrent_flat_map]` for the complete list of visitation-enabled operations. == Whole-Table Visitation In the absence of iterators, `visit_all` is provided as an alternative way to process all the elements in the container: [source,c++] ---- m.visit_all([](auto& x) { x.second = 0; // reset the mapped part of the element }); ---- In C++17 compilers implementing standard parallel algorithms, whole-table visitation can be parallelized: [source,c++] ---- m.visit_all(std::execution::par, [](auto& x) { // run in parallel x.second = 0; // reset the mapped part of the element }); ---- Traversal can be interrupted midway: [source,c++] ---- // finds the key to a given (unique) value int key = 0; int value = ...; bool found = !m.visit_while([&](const auto& x) { if(x.second == value) { key = x.first; return false; // finish } else { return true; // keep on visiting } }); if(found) { ... } ---- There is one last whole-table visitation operation, `erase_if`: [source,c++] ---- m.erase_if([](auto& x) { return x.second == 0; // erase the elements whose mapped value is zero }); ---- `visit_while` and `erase_if` can also be parallelized. Note that, in order to increase efficiency, whole-table visitation operations do not block the table during execution: this implies that elements may be inserted, modified or erased by other threads during visitation. It is advisable not to assume too much about the exact global state of a concurrent container at any point in your program. == Bulk visitation Suppose you have an `std::array` of keys you want to look up for in a concurrent map: [source,c++] ---- std::array keys; ... for(const auto& key: keys) { m.visit(key, [](auto& x) { ++x.second; }); } ---- _Bulk visitation_ allows us to pass all the keys in one operation: [source,c++] ---- m.visit(keys.begin(), keys.end(), [](auto& x) { ++x.second; }); ---- This functionality is not provided for mere syntactic convenience, though: by processing all the keys at once, some internal optimizations can be applied that increase performance over the regular, one-at-a-time case (consult the xref:benchmarks.adoc#benchmarks_boostconcurrent_flatnode_map[benchmarks]). In fact, it may be beneficial to buffer incoming keys so that they can be bulk visited in chunks: [source,c++] ---- static constexpr auto bulk_visit_size = boost::concurrent_flat_map::bulk_visit_size; std::array buffer; std::size_t i=0; while(...) { // processing loop ... buffer[i++] = k; if(i == bulk_visit_size) { map.visit(buffer.begin(), buffer.end(), [](auto& x) { ++x.second; }); i = 0; } ... } // flush remaining keys map.visit(buffer.begin(), buffer.begin() + i, [](auto& x) { ++x.second; }); ---- There's a latency/throughput tradeoff here: it will take longer for incoming keys to be processed (since they are buffered), but the number of processed keys per second is higher. `bulk_visit_size` is the recommended chunk size —smaller buffers may yield worse performance. == Blocking Operations Concurrent containers can be copied, assigned, cleared and merged just like any other Boost.Unordered container. Unlike most other operations, these are _blocking_, that is, all other threads are prevented from accesing the tables involved while a copy, assignment, clear or merge operation is in progress. Blocking is taken care of automatically by the library and the user need not take any special precaution, but overall performance may be affected. Another blocking operation is _rehashing_, which happens explicitly via `rehash`/`reserve` or during insertion when the table's load hits `max_load()`. As with non-concurrent containers, reserving space in advance of bulk insertions will generally speed up the process. == Interoperability with non-concurrent containers As open-addressing and concurrent containers are based on the same internal data structure, they can be efficiently move-constructed from their non-concurrent counterpart, and vice versa. [caption=, title='Table {counter:table-counter}. Concurrent/non-concurrent interoperatibility'] [cols="1,1", frame=all, grid=all] |=== ^|`boost::concurrent_node_set` ^|`boost::unordered_node_set` ^|`boost::concurrent_node_map` ^|`boost::unordered_node_map` ^|`boost::concurrent_flat_set` ^|`boost::unordered_flat_set` ^|`boost::concurrent_flat_map` ^|`boost::unordered_flat_map` |=== This interoperability comes handy in multistage scenarios where parts of the data processing happen in parallel whereas other steps are non-concurrent (or non-modifying). In the following example, we want to construct a histogram from a huge input vector of words: the population phase can be done in parallel with `boost::concurrent_flat_map` and results then transferred to the final container. [source,c++] ---- std::vector words = ...; // Insert words in parallel boost::concurrent_flat_map m0; std::for_each( std::execution::par, words.begin(), words.end(), [&](const auto& word) { m0.try_emplace_or_visit(word, 1, [](auto& x) { ++x.second; }); }); // Transfer to a regular unordered_flat_map boost::unordered_flat_map m=std::move(m0); ----