mirror of
https://github.com/boostorg/mysql.git
synced 2025-05-12 14:11:41 +00:00
223 lines
11 KiB
Plaintext
223 lines
11 KiB
Plaintext
[/
|
|
Copyright (c) 2019-2025 Ruben Perez Hidalgo (rubenperez038 at gmail dot com)
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
]
|
|
|
|
[section:text_queries Text queries and client-side SQL formatting]
|
|
[nochunk]
|
|
|
|
['Text queries] are those that use MySQL text protocol for execution.
|
|
Plain strings and [reflink with_params] use this protocol.
|
|
This contrasts with [link mysql.prepared_statements prepared statements],
|
|
which are first prepared and then executed separately, and use a binary protocol.
|
|
|
|
[warning
|
|
[*Never compose SQL queries using raw string concatenation].
|
|
This is insecure and can lead to [*SQL injection vulnerabilities].
|
|
Use the client-side SQL formatting facilities explained in this section
|
|
to avoid vulnerabilities.
|
|
]
|
|
|
|
Using text queries you can run
|
|
[link mysql.multi_resultset.multi_queries multiple semicolon-separated queries],
|
|
which can improve efficiency.
|
|
|
|
|
|
[section Using with_params for simple queries]
|
|
|
|
[reflink with_params] is the easiest way to use client-side SQL formatting.
|
|
It can be used as a simpler and more flexible alternative to prepared statements.
|
|
While prepared statements expand queries server-side, SQL formatting does it client-side.
|
|
Please read the [link mysql.text_queries.comparison comparison with prepared statements]
|
|
and the [link mysql.text_queries.comparison.security security considerations] sections for more info.
|
|
|
|
[reflink with_params] takes a SQL query
|
|
string with placeholders and a set of parameters. When passed to
|
|
[refmemunq any_connection execute] or [refmemunq any_connection async_execute],
|
|
the query is expanded in the client with the supplied parameters and
|
|
sent to the server for execution:
|
|
|
|
[text_queries_with_params_simple]
|
|
|
|
Curly braces (`{}`) represent placeholders (technically called ['replacement fields]).
|
|
The notation and semantics are similar to [@https://en.cppreference.com/w/cpp/utility/format/format `std::format`].
|
|
|
|
All fundamental types can be used as query parameters. This includes integers, floating point types,
|
|
strings, blobs, dates and times:
|
|
|
|
[text_queries_with_params_scalars]
|
|
|
|
`std::optional<T>` and `boost::optional<T>` can also be used:
|
|
|
|
[text_queries_with_params_optionals]
|
|
|
|
Collections and ranges are supported, as long as its elements can be formatted:
|
|
|
|
[text_queries_with_params_ranges]
|
|
|
|
See [link mysql.sql_formatting_advanced.ranges this section] for more on formatting ranges,
|
|
and [link mysql.sql_formatting_advanced.reference this table] for a reference of types
|
|
that have built-in support for SQL formatting.
|
|
|
|
[note
|
|
Like with `std::format`, the query string passed to `with_params` must be known at
|
|
compile-time. You can skip this check using the [reflink runtime] function.
|
|
]
|
|
|
|
Like `std::format`, you can use arguments with explicit indices:
|
|
|
|
[text_queries_with_params_manual_indices]
|
|
|
|
See [link mysql.sql_formatting_advanced.format_string_syntax this section]
|
|
for a reference on the format string syntax.
|
|
|
|
[endsect]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[section:errors Common errors and how to fix them]
|
|
|
|
Not all values can be formatted. If the library finds that formatting a certain
|
|
value can cause an ambiguity that could lead to a security problem, an error
|
|
will be issued and the query won't be sent to the server. Here are the most common errors:
|
|
|
|
* `client_errc::invalid_encoding`
|
|
* Cause: one of your string parameters contains invalid code points.
|
|
With the default character set, this means that it contains [*invalid UTF-8].
|
|
* Solution: all string values must be encoded according to the connection's character
|
|
set (usually UTF-8). Sanitize or reject such values. Use the [reflink blob] and [reflink blob_view]
|
|
types for values that don't represent character strings, but arbitrary binary values.
|
|
* `client_errc::unformattable_value`
|
|
* Cause: one of your parameters contains an invalid value. For instance, a `double`
|
|
contains a `NaN` or an `Inf`, unsupported by MySQL.
|
|
* Solution: reject such values, or replace them by `NULL` before passing them to client-side SQL formatting.
|
|
* `client_errc::unknown_character_set`
|
|
* Cause: your connection doesn't know the character set you're using. Knowing the character set in use
|
|
is required to generate queries securely. This situation can happen after calling [refmemunq any_connection reset_connection]
|
|
or if you used a custom [refmem connect_params connection_collation] when connecting.
|
|
* Solution: use a [reflink connection_pool] instead of manually resetting connections. If you can't,
|
|
use the default [refmemunq connect_params connection_collation] when connecting, and use
|
|
[refmemunq any_connection set_character_set] or [refmemunq any_connection async_set_character_set]
|
|
after resetting connections.
|
|
* [link mysql.charsets.tracking Learn more] about how character set tracking works.
|
|
|
|
For example:
|
|
|
|
[text_queries_with_params_invalid_encoding]
|
|
|
|
[endsect]
|
|
|
|
|
|
|
|
|
|
[section:comparison Prepared statements vs. client-side SQL formatting]
|
|
|
|
Although both serve a similar purpose, they are fundamentally different. Prepared statements
|
|
are parsed and expanded by the server. Client-side SQL expands the query in the client
|
|
and sends it to the server as a string.
|
|
|
|
This means that [*client-side SQL does not understand your queries]. It just knows about how
|
|
to format MySQL types into a string without creating vulnerabilities, but otherwise treats
|
|
your queries as opaque strings. Client-side SQL yields [*greater flexibility] (you can dynamically
|
|
compose any query), while statements have more limitations. This also means that
|
|
[*you need to pay more attention to compose valid queries], specially when dealing with complex conditionals.
|
|
Logic errors may lead to exploits. Please read the
|
|
[link mysql.text_queries.comparison.security security considerations section]
|
|
for more info.
|
|
|
|
Client-side SQL entails [*less round-trips to the server] than statements, and is usually more efficient
|
|
for lightweight queries. However, it uses the less compact text protocol, which may be slower for
|
|
queries retrieving a lot of data. See the
|
|
[link mysql.text_queries.comparison.efficiency efficiency considerations section] for more info.
|
|
|
|
In general, [*use client-side SQL] formatting for the following cases:
|
|
|
|
* Simple queries that don't retrieve a lot of data. Default to `with_params` and
|
|
only switch to statements if your performance measurements says so.
|
|
* Queries involving dynamic SQL that can't be achieved by statements. Typical cases include:
|
|
* Dynamic filters ([link mysql.examples.dynamic_filters example]).
|
|
* Batch inserts. Inserting rows one by one can lead to poor efficiency.
|
|
You can use client-side SQL formatting to compose a single `INSERT` that
|
|
inserts several rows at once (see [link mysql.examples.batch_inserts example 1]
|
|
and [link mysql.examples.batch_inserts_generic example 2]).
|
|
* PATCH-like updates, where the field list in an `UPDATE` must be dynamic
|
|
([link mysql.examples.patch_updates example]).
|
|
* Queries involving dynamic identifiers, like table and field names.
|
|
* Conditional sorting.
|
|
* Pipelines consisting of several semicolon-separated queries with dynamic fields.
|
|
|
|
On the other hand, [*prefer prepared statements] if:
|
|
|
|
* You are executing the same query over and over. You can prepare the statement
|
|
once and execute it several times.
|
|
* Your query is retrieving a lot of data, and you have performed the relevant performance measurements.
|
|
|
|
|
|
|
|
[heading:efficiency Efficiency considerations]
|
|
|
|
Both client-side SQL formatting and prepared statements have pros and cons efficiency-wise:
|
|
|
|
* Client-formatted SQL entails [*less round-trips to the server]. For prepared statements, you usually need
|
|
a call to prepare the statement, another one to execute it, and possibly a final one to close it.
|
|
Client-formatted SQL only requires the execution round-trip. This performance gain increases with network
|
|
latency and if you are using TLS.
|
|
* Prepared statements always entail a [*mutation of session state], while client-formatted SQL may not.
|
|
If you're using a [reflink connection_pool] with prepared statements, you can't use
|
|
[refmem pooled_connection return_without_reset], as this will leak the statement.
|
|
With client-formatted queries, reset may not be required if your SQL doesn't mutate session state.
|
|
* Client-formatted SQL queries use a usually [*less efficient text-based protocol], while prepared statements
|
|
use a more compact binary protocol. This is relevant if you're retrieving lots of data that is
|
|
slow to convert to and from text (like doubles).
|
|
* [*Prepared statements can be re-used]. If you need to execute a query several times,
|
|
prepared statements will only be parsed once.
|
|
* Client-formatted SQL allows [*more efficient patterns] than prepared statements,
|
|
like batch inserts and semicolon-separated queries.
|
|
|
|
|
|
|
|
|
|
[heading:security Security considerations]
|
|
|
|
Both client-side SQL formatting and prepared statements [*protect against SQL injection].
|
|
Statements do so by parsing the query with placeholders server-side, before performing parameter
|
|
substitution. Client-side SQL quotes and escapes your values to avoid injection, but
|
|
[*does not understand your queries].
|
|
|
|
This means that you need to [*ensure that your queries always expand to valid SQL].
|
|
This is trivial for simple queries, but may be an issue with more complex ones,
|
|
involving ranges or dynamic identifiers. For instance, the following query may
|
|
expand to invalid SQL if the provided range is empty:
|
|
|
|
[text_queries_with_params_empty_ranges]
|
|
|
|
The risk is higher if you're building your query by pieces using [reflink format_sql_to].
|
|
|
|
To sum up:
|
|
|
|
* Client-side SQL protects against SQL injection.
|
|
* Client-side SQL does not protect against logic errors. The risk is only present in complex
|
|
queries. We suggest the following advice:
|
|
* Avoid complex query generation logic as much as possible.
|
|
Use a single format string instead of `format_sql_to`, unless you have no other option.
|
|
* When using ranges, consider if the empty range would lead to valid SQL or not.
|
|
* Thoroughly test complex query generation logic.
|
|
* Client-side SQL requires knowing the connection's current character set. This usually happens
|
|
out of the box, and will lead to a [link mysql.text_queries.errors controlled error]
|
|
otherwise. Some recommendations:
|
|
* If in doubt, always use the default character set (`utf8mb4`).
|
|
* Never issue `SET NAMES` or `SET CHARACTER SET` statements directly -
|
|
use [refmem any_connection set_character_set] or [refmemunq any_connection async_set_character_set], instead.
|
|
* If you're using [reflink format_sql] or [reflink format_sql_to], never craft [reflink format_options] values manually.
|
|
Use [refmem any_connection format_opts], instead.
|
|
|
|
|
|
[endsect]
|
|
|
|
[endsect]
|