This is the second iteration of the backends, which were both tested
on a QEMU VM and did not show any test failures. The essential difference
from the first version is that in AArch64 we now initialize the success flag
in the asm blocks in compare_exchange operations rather than relying on
compiler initializing it before passing into the asm block as an in-out
parameter. Apparently, this sometimes didn't work for some reason, which
made compare_exchange_strong return incorrect value, which broke futex-based
mutexes in the lock pool.
The above change was also applied to AArch32, along with minor corrections
in the asm blocks constraints.
The old operations template is replaced with core_operations, which falls
back to core_arch_operations, which falls back to core_operations_emulated.
The core_operations layer is intended for more or less architecture-neutral
backends, like the one based on gcc __atomic* intrinsics. It may fall back
to core_arch_operations where it is not supported by the compiler or where
the latter is more optimal. For example, where gcc does not implement 128-bit
atomic operations via __atomic* intrinsics, we support them in the
core_arch_operations backend, which uses inline assembler blocks.
The old emulated_operations template is largely unchanged and was renamed to
core_operations_emulated for naming consistency. All other operation templates
were also renamed for consistency (e.g. generic_wait_operations ->
wait_operations_generic).
Fence operations have been extracted to a separate set of structures:
fence_operations, fence_arch_opereations and fence_operations_emulated. These
are similar to the core operations described above. This structuring also
allows to fall back from fence_operations to fence_arch_opereations when
the latter is more optimal.
The net result of these changes is that 128-bit and 64-bit atomic operations
should now be consistently supported on all architectures that support them.
Previously, only x86 was supported via local hacks for gcc and clang.
The backend implements core and extra atomic operations using gcc asm blocks.
The implementation supports extensions added in ARMv8.1 and ARMv8.3. It supports
both little and big endian targets.
Currently, the code has not been tested on real hardware. It has been tested
on a QEMU VM.
On 32-bit x86, 64-bit integers have 4-byte alignment, which resulted in
a non-integral type being selected for storage type in case of lock-based
atomic_ref. This broke arithmetic and bitwise operations on atomic_ref.
We now select an integral type based on its native alignment, not the storage
alignment we require for atomic operations. This is fine in case of lock-based
backend.
Also, extended buffer_storage with support for specifying alignment and removed
aligned_buffer_storage and storage128_t.
The support includes:
- The standard fetch_add/fetch_sub operations.
- Extra operations: (fetch_/opaque_)negate, (opaque_)add/sub.
- Extra capability macros: BOOST_ATOMIC_FLOAT/DOUBLE/LONG_DOUBLE_LOCK_FREE.
The atomic operations are currently implemented on top of the integer-based
backends and thus are mostly CAS-based. The CAS operations perform binary
comparisons, and as such have different behavior wrt. special FP values like
NaN and signed zero than normal C++.
The support for floating point types is optional and can be disabled by
defining BOOST_ATOMIC_NO_FLOATING_POINT. This can be useful if on a certain
platform parameters of the floating point types cannot be deduced from the
compiler-defined or system macros (in which case the compilation fails).
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0020r6.html