mysql/doc/qbk/06_text_queries.qbk
2025-02-13 21:40:01 +01:00

223 lines
11 KiB
Plaintext

[/
Copyright (c) 2019-2025 Ruben Perez Hidalgo (rubenperez038 at gmail dot com)
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]
[section:text_queries Text queries and client-side SQL formatting]
[nochunk]
['Text queries] are those that use MySQL text protocol for execution.
Plain strings and [reflink with_params] use this protocol.
This contrasts with [link mysql.prepared_statements prepared statements],
which are first prepared and then executed separately, and use a binary protocol.
[warning
[*Never compose SQL queries using raw string concatenation].
This is insecure and can lead to [*SQL injection vulnerabilities].
Use the client-side SQL formatting facilities explained in this section
to avoid vulnerabilities.
]
Using text queries you can run
[link mysql.multi_resultset.multi_queries multiple semicolon-separated queries],
which can improve efficiency.
[section Using with_params for simple queries]
[reflink with_params] is the easiest way to use client-side SQL formatting.
It can be used as a simpler and more flexible alternative to prepared statements.
While prepared statements expand queries server-side, SQL formatting does it client-side.
Please read the [link mysql.text_queries.comparison comparison with prepared statements]
and the [link mysql.text_queries.comparison.security security considerations] sections for more info.
[reflink with_params] takes a SQL query
string with placeholders and a set of parameters. When passed to
[refmemunq any_connection execute] or [refmemunq any_connection async_execute],
the query is expanded in the client with the supplied parameters and
sent to the server for execution:
[text_queries_with_params_simple]
Curly braces (`{}`) represent placeholders (technically called ['replacement fields]).
The notation and semantics are similar to [@https://en.cppreference.com/w/cpp/utility/format/format `std::format`].
All fundamental types can be used as query parameters. This includes integers, floating point types,
strings, blobs, dates and times:
[text_queries_with_params_scalars]
`std::optional<T>` and `boost::optional<T>` can also be used:
[text_queries_with_params_optionals]
Collections and ranges are supported, as long as its elements can be formatted:
[text_queries_with_params_ranges]
See [link mysql.sql_formatting_advanced.ranges this section] for more on formatting ranges,
and [link mysql.sql_formatting_advanced.reference this table] for a reference of types
that have built-in support for SQL formatting.
[note
Like with `std::format`, the query string passed to `with_params` must be known at
compile-time. You can skip this check using the [reflink runtime] function.
]
Like `std::format`, you can use arguments with explicit indices:
[text_queries_with_params_manual_indices]
See [link mysql.sql_formatting_advanced.format_string_syntax this section]
for a reference on the format string syntax.
[endsect]
[section:errors Common errors and how to fix them]
Not all values can be formatted. If the library finds that formatting a certain
value can cause an ambiguity that could lead to a security problem, an error
will be issued and the query won't be sent to the server. Here are the most common errors:
* `client_errc::invalid_encoding`
* Cause: one of your string parameters contains invalid code points.
With the default character set, this means that it contains [*invalid UTF-8].
* Solution: all string values must be encoded according to the connection's character
set (usually UTF-8). Sanitize or reject such values. Use the [reflink blob] and [reflink blob_view]
types for values that don't represent character strings, but arbitrary binary values.
* `client_errc::unformattable_value`
* Cause: one of your parameters contains an invalid value. For instance, a `double`
contains a `NaN` or an `Inf`, unsupported by MySQL.
* Solution: reject such values, or replace them by `NULL` before passing them to client-side SQL formatting.
* `client_errc::unknown_character_set`
* Cause: your connection doesn't know the character set you're using. Knowing the character set in use
is required to generate queries securely. This situation can happen after calling [refmemunq any_connection reset_connection]
or if you used a custom [refmem connect_params connection_collation] when connecting.
* Solution: use a [reflink connection_pool] instead of manually resetting connections. If you can't,
use the default [refmemunq connect_params connection_collation] when connecting, and use
[refmemunq any_connection set_character_set] or [refmemunq any_connection async_set_character_set]
after resetting connections.
* [link mysql.charsets.tracking Learn more] about how character set tracking works.
For example:
[text_queries_with_params_invalid_encoding]
[endsect]
[section:comparison Prepared statements vs. client-side SQL formatting]
Although both serve a similar purpose, they are fundamentally different. Prepared statements
are parsed and expanded by the server. Client-side SQL expands the query in the client
and sends it to the server as a string.
This means that [*client-side SQL does not understand your queries]. It just knows about how
to format MySQL types into a string without creating vulnerabilities, but otherwise treats
your queries as opaque strings. Client-side SQL yields [*greater flexibility] (you can dynamically
compose any query), while statements have more limitations. This also means that
[*you need to pay more attention to compose valid queries], specially when dealing with complex conditionals.
Logic errors may lead to exploits. Please read the
[link mysql.text_queries.comparison.security security considerations section]
for more info.
Client-side SQL entails [*less round-trips to the server] than statements, and is usually more efficient
for lightweight queries. However, it uses the less compact text protocol, which may be slower for
queries retrieving a lot of data. See the
[link mysql.text_queries.comparison.efficiency efficiency considerations section] for more info.
In general, [*use client-side SQL] formatting for the following cases:
* Simple queries that don't retrieve a lot of data. Default to `with_params` and
only switch to statements if your performance measurements says so.
* Queries involving dynamic SQL that can't be achieved by statements. Typical cases include:
* Dynamic filters ([link mysql.examples.dynamic_filters example]).
* Batch inserts. Inserting rows one by one can lead to poor efficiency.
You can use client-side SQL formatting to compose a single `INSERT` that
inserts several rows at once (see [link mysql.examples.batch_inserts example 1]
and [link mysql.examples.batch_inserts_generic example 2]).
* PATCH-like updates, where the field list in an `UPDATE` must be dynamic
([link mysql.examples.patch_updates example]).
* Queries involving dynamic identifiers, like table and field names.
* Conditional sorting.
* Pipelines consisting of several semicolon-separated queries with dynamic fields.
On the other hand, [*prefer prepared statements] if:
* You are executing the same query over and over. You can prepare the statement
once and execute it several times.
* Your query is retrieving a lot of data, and you have performed the relevant performance measurements.
[heading:efficiency Efficiency considerations]
Both client-side SQL formatting and prepared statements have pros and cons efficiency-wise:
* Client-formatted SQL entails [*less round-trips to the server]. For prepared statements, you usually need
a call to prepare the statement, another one to execute it, and possibly a final one to close it.
Client-formatted SQL only requires the execution round-trip. This performance gain increases with network
latency and if you are using TLS.
* Prepared statements always entail a [*mutation of session state], while client-formatted SQL may not.
If you're using a [reflink connection_pool] with prepared statements, you can't use
[refmem pooled_connection return_without_reset], as this will leak the statement.
With client-formatted queries, reset may not be required if your SQL doesn't mutate session state.
* Client-formatted SQL queries use a usually [*less efficient text-based protocol], while prepared statements
use a more compact binary protocol. This is relevant if you're retrieving lots of data that is
slow to convert to and from text (like doubles).
* [*Prepared statements can be re-used]. If you need to execute a query several times,
prepared statements will only be parsed once.
* Client-formatted SQL allows [*more efficient patterns] than prepared statements,
like batch inserts and semicolon-separated queries.
[heading:security Security considerations]
Both client-side SQL formatting and prepared statements [*protect against SQL injection].
Statements do so by parsing the query with placeholders server-side, before performing parameter
substitution. Client-side SQL quotes and escapes your values to avoid injection, but
[*does not understand your queries].
This means that you need to [*ensure that your queries always expand to valid SQL].
This is trivial for simple queries, but may be an issue with more complex ones,
involving ranges or dynamic identifiers. For instance, the following query may
expand to invalid SQL if the provided range is empty:
[text_queries_with_params_empty_ranges]
The risk is higher if you're building your query by pieces using [reflink format_sql_to].
To sum up:
* Client-side SQL protects against SQL injection.
* Client-side SQL does not protect against logic errors. The risk is only present in complex
queries. We suggest the following advice:
* Avoid complex query generation logic as much as possible.
Use a single format string instead of `format_sql_to`, unless you have no other option.
* When using ranges, consider if the empty range would lead to valid SQL or not.
* Thoroughly test complex query generation logic.
* Client-side SQL requires knowing the connection's current character set. This usually happens
out of the box, and will lead to a [link mysql.text_queries.errors controlled error]
otherwise. Some recommendations:
* If in doubt, always use the default character set (`utf8mb4`).
* Never issue `SET NAMES` or `SET CHARACTER SET` statements directly -
use [refmem any_connection set_character_set] or [refmemunq any_connection async_set_character_set], instead.
* If you're using [reflink format_sql] or [reflink format_sql_to], never craft [reflink format_options] values manually.
Use [refmem any_connection format_opts], instead.
[endsect]
[endsect]