mysql/doc/qbk/20_1_benchmarks.qbk
Anarthal (Rubén Pérez) 303b9f0b59
Added benchmarks against the official drivers
Added one_small_row, one_big_row, many_rows, stmt_params benchmarks against libmysqlclient and libmariadb
Added a CI build to compile and run benchmarks
Added a Python script to run the benchmarks
Refactored the connection_pool benchmark to be use data independent from examples

close #458
2025-04-02 11:32:43 +02:00

104 lines
5.0 KiB
Plaintext

[/
Copyright (c) 2019-2025 Ruben Perez Hidalgo (rubenperez038 at gmail dot com)
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]
[section:benchmarks Benchmarks against the official connectors]
[nochunk]
MySQL and MariaDB ship with official C connectors:
[@https://dev.mysql.com/downloads/c-api/ libmysqlclient] and
[@https://mariadb.com/kb/en/mariadb-connectorc-api-functions/ libmariadb].
Both implement the client/server protocol, as Boost.MySQL does.
The question then arises: is Boost.MySQL as fast as the official drivers?
[note
TL;DR: Boost.MySQL is as fast as the official C APIs, and may be faster under some circumstances.
]
[heading Design decisions]
These benchmarks focus on [*the speed of the protocol implementation], in an attempt to
answer the question above. This should take into account, at least,
(de)serialization and buffering. It shouldn't take into account features
unique to Boost.MySQL, like the static interface or connection pooling.
Both libmysqlclient and libmariadb offer a connection type, similar to [reflink any_connection],
with both sync and async primitives. Sync functions are similar to the ones in Boost.MySQL (although C-flavored).
Async functions are much lower-level, and often require either integration into a framework
(like Asio or libuv) or writing `poll`/`epoll` code by hand. None of these options is trivial.
Additionally, sync functions have less overhead, so they're best suited to answer our question.
For this reason, [*we only use sync functions] in the benchmarks.
The benchmarks [*use prepared statements only]. The official drivers handle text
queries (issued by `mysql_real_query`) and prepared statements differently.
Rows generated by text queries are returned as strings, and need to be parsed by
the user. Boost.MySQL handles this parsing automatically for you.
For this reason, comparing text queries doesn't make much sense.
Prepared statements are handled similarly, and are better suited for
big rows and datasets.
[*All tests use a real database]. Neither Boost.MySQL nor the official C clients
expose (de)serialization functions. Buffering and optimizing the number of system
calls is also critical for efficiency, and can only be measured with real communication.
The downside is that database processing introduces delays, and might end up
being the bottleneck.
The benchmarks try to [*minimize communication overhead by using UNIX sockets].
[heading Benchmark procedure]
Benchmark source code can be found in the [@https://github.com/boostorg/mysql/tree/master/bench bench/]
folder of the repo. The following benchmarks are performed:
* One small row. Executes a statement yielding a single row with 15 fields,
including most of the possible types. Each row weighs around 500 bytes.
Execution is repeated 10000 times. The Boost.MySQL version uses [refmem any_connection execute].
* One big row. Like the above, but rows have 17 fields, and each row weighs between 72 and 108 KB.
The Boost.MySQL version uses [refmem any_connection start_execution], which allows zero-copying.
* Many rows. Executes a statement that yields 5000 of the "big rows" described above.
The statement is executed only once. The Boost.MySQL version uses [refmem any_connection start_execution]
because the resultset size is big.
* Statement with parameters. Executes a statement with 17 parameters, roughly matching the "big row"
structure described above. Intended to measure serialization speed.
The statement is executed 1000 times.
Benchmark conditions:
* Database: MySQL 8.4.1, running on a Docker container in localhost.
* OS: Ubuntu 24.04
* CPU: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz, 8 cores.
* Compiler: g++-14, using CMake's Release config, C++23.
* MySQL C API: libmysqlclient24 (as included in the official MySQL 8.4.4 release).
* MariaDB C API: libmariadb3 1:10.11.11-0ubuntu0.24.04.2 (official Ubuntu package).
* Boost.MySQL: Boost 1.87.0. The header-only version is used
(without defining `BOOST_MYSQL_SEPARATE_COMPILATION`), since it's slightly faster.
[heading Results]
[$mysql/images/bench-protocol.png [align center]]
The three libraries exhibit a similar level of performance, which is expected
from a correctly implemented binary protocol. Boost.MySQL outperforms libmysqlclient
in the single row benchmarks, and is on par with libmariadb. Differences in the
other benchmarks don't appear to be statistically significant.
During these benchmarks, some potential performance improvement areas
have been identified. See [https://github.com/boostorg/mysql/issues/458 this issue]
for details.
Remember that protocol is just one piece to the whole puzzle.
Correctly using features like [reflink connection_pool], [reflink with_params],
multi-function operations and multi-queries can make a huge performance difference
in your application. Never assume anything and always measure!
Acknowledgments: thanks [@https://github.com/LowLevelMahn LowLevelMahn] for proposing the benchmarks.
[endsect]