mysql/doc/qbk/11_multi_function.qbk
2025-02-11 20:42:41 +01:00

228 lines
11 KiB
Plaintext

[/
Copyright (c) 2019-2025 Ruben Perez Hidalgo (rubenperez038 at gmail dot com)
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]
[section:multi_function Multi-function operations]
[nochunk]
Multi-function operations allow running operations as a set of separate
steps, which gives you better control over execution. They work by splitting
some of the reads and writes into several function calls.
You can use multi-function operations to execute text queries and prepared statements,
and through the dynamic or the static interface.
[heading Protocol primer]
To make a good use of multi-function operations, you should have a basic understanding
of the underlying protocol.
The protocol uses ['messages] to communicate. These are delimited by headers containing the message length.
All operations are initiated by the client, by sending a single ['request message], to which
the server responds with a set of ['response messages].
The diagram below shows the message exchange between client and server for text queries and statement
executions. Each arrow represents a message.
[$mysql/images/protocol.svg [align center]]
There are two separate cases:
* If your query retrieved at least one column (even if no rows were generated), we're in ['case 1]. The server sends:
* An initial packet informing that the query executed correctly, and that we're in ['case 1].
* Some metadata packets describing the columns that the query retrieved. These become available
to the user and are necessary to parse the rows.
* The actual rows.
* An OK packet, which marks the end of the resultset and contains information like `last_insert_id` and
`affected_rows`.
* If your query didn't retrieve any column, we're in ['case 2]. The server will just send an OK packet,
with the same information as in ['case 1].
[refmem connection execute] handles the full message exchange. In contrast,
[refmem connection start_execution] will not read the rows, if any.
Some takeaways:
* The distinction between single-function and multi-function operations exists only
in the client. The wire messages exchanged by both are the same.
* There is no way to tell how many rows a resultset has upfront. You need to read row by row until
you find the OK packet marking the end of the resultset.
* When the server processes the request message, [*it sends all the response messages immediately].
These responses will be waiting in the client to be read. If you don't read [*all] of them,
subsequent operations will mistakenly read them as their response, causing packet mismatches.
Be careful and don't let this happen!
[heading Using multi-function operations through the dynamic interface]
[reflink execution_state] is the main class for the dynamic interface in multi-function
operations. An execution state holds information required to progress the execution operation,
like metadata (required to parse the rows) and protocol state. Contrary to `results`, it doesn't contain
the rows.
Given the following table definition:
[multi_function_setup]
You can start a multi-function operation using [refmem connection start_execution]:
[multi_function_dynamic_start]
We now [*must] read all the generated rows
by calling [refmem connection read_some_rows], which will return a batch of an unspecified size:
[multi_function_dynamic_read]
Some remarks:
* If there are rows to be read, `read_some_rows` will return at least one, but may return more.
* [refmem execution_state complete] returns `true` after we've read the final OK packet for this operation.
* The final `row_batch` may or may not be empty, depending on the number of rows and their size.
* Calling `read_some_rows` after reading the final OK packet returns an empty batch.
`read_some_rows` returns a [reflink rows_view] object pointing into the connection's internal buffer.
This view is valid until the connection performs any other operation involving a network transfer.
Note that there is no need to distinguish between ['case 1] and ['case 2] in the diagram above in our code,
as reading rows for a complete operation is well defined.
[heading Using multi-function operations through the static interface]
The mechanics are similar to what's been exposed above. The static interface uses
[reflink static_execution_state] to carry state. As with [reflink static_results], we must define and
pass a type describing our rows:
[describe_post]
We can now start our operation using the same [refmem connection start_execution]:
[multi_function_static_start]
We now [*must] read all the generated rows by calling [refmem connection read_some_rows]:
[multi_function_static_read]
Some remarks:
* [reflink static_execution_state] doesn't store rows anyhow. It uses the row types passed
as template parameters to validate the metadata returned by the server, and ensure it is compatible
with the C++ data structures that will be used with `read_some_rows`.
* We must pass `read_some_rows` a `boost::span` of the appropriate row type.
We've used `std::array` to place rows on the stack, but you can use any other contiguous range.
* `read_some_rows` returns the number of read rows. At maximum, this will be the size
of the span, but there may be less, depending on row and network buffer sizes.
* If there are rows to be read, `read_some_rows` will return at least one, but may return more.
* [refmem execution_state complete] returns `true` after we've read the final OK packet for this operation.
* The final read may or may not return rows, depending on the number of rows and their size.
* Calling `read_some_rows` after reading the final OK packet always reads zero rows.
[heading Accessing metadata and OK packet data]
You can access metadata at any point, using [refmem execution_state meta] or [refmem static_execution_state meta].
This function returns a collection of [reflink metadata] objects. For more information, please refer to [link mysql.meta this section].
You can access OK packet data using functions like [refmemunq execution_state last_insert_id]
and [refmemunq execution_state affected_rows] in both `execution_state` and `static_execution_state`.
As this information is contained in the OK packet,
[*it can only be accessed once the [refmemunq execution_state complete] function returns `true`].
[heading Using multi-function operations with stored procedures and multi-queries]
When using operations that return more than one resultset (e.g. when calling stored procedures),
the protocol is slightly more complex:
[$mysql/images/protocol_multi_resultset.svg [align center]]
The message exchange is as follows:
* A single execution request is sent to the server.
* The server sends a first resultset, as in the single resultset case.
The OK packet contains a flag indicating that another resultset follows.
* Another resultset is sent, with the same structure as the previous one. The process is
repeated until an OK packet indicates that no more resultsets follow.
For example, given the following stored procedure:
[/ This is an actual procedure. Code imports from SQL don't work. Make sure it doesn't go out of sync ]
[!teletype]
```
CREATE PROCEDURE get_company(IN pin_company_id CHAR(10))
BEGIN
START TRANSACTION READ ONLY;
SELECT id, name, tax_id FROM company WHERE id = pin_company_id;
SELECT first_name, last_name, salary FROM employee WHERE company_id = pin_company_id;
COMMIT;
END
```
We can write:
[table
[
[Dynamic interface]
[Static interface]
]
[
[[multi_function_stored_procedure_dynamic]]
[[multi_function_stored_procedure_static]]
]
]
Note that we're using [refmemunq execution_state should_read_rows] instead of [refmemunq execution_state complete]
as our loop termination condition. `complete()` returns true when all the resultsets have been read,
while `should_read_rows()` will return false once an individual result has been fully read.
When using the static interface with multi-function operations, not all schema mismatches can be found
by the `start_execution` function, since not all the information is available at this point. Errors may
be reported by `read_some_rows` and `read_resultset_head`, too. Overall, the same checks are performed as
when using [refmem connection execute], but at different points in time.
[heading Multi-resultset in the general case]
`execution_state` and `static_execution_state` can be seen as state machines with four states. Each state
describes which reading function should be invoked next:
* [refmemunq execution_state should_start_op]: the initial state, after you default-construct an `execution_state`.
You should call [refmem connection start_execution] or [refmem connection async_start_execution] to start the operation.
* [refmemunq execution_state should_read_rows]: the next operation should be [refmem connection read_some_rows],
to read the generated rows.
* [refmemunq execution_state should_read_head]: the next operation should be [refmem connection read_resultset_head],
to read the next resultset metadata. Only operations that generate multiple resultsets go into this state.
* [refmemunq execution_state complete]: no more operations are required.
For multi-function operations, you may also access OK packet data ever time a resultset has completely
been read, i.e. when [refmemunq execution_state should_read_head] returns `true`.
[link mysql.examples.source_script This example] shows this general case. It uses
multi-queries to run an arbitrary SQL script file.
[heading:read_some_rows More on read_some_rows]
To properly understand `read_some_rows`, we need to know that every [reflink connection]
owns an *internal buffer*, where packets sent by the server are stored.
It is a single, flat buffer, and you can configure its initial size when creating
a `connection`, passing a [reflink buffer_params] object as the first argument to `connection`'s constructor.
The read buffer may be grown under certain circumstances to accommodate large messages.
`read_some_rows` gets the maximum number of rows that fit in the internal buffer (without growing it)
performing a single `read_some` operation on the stream (or using cached data).
If there are rows to read, `read_some_rows` guarantees to read at least one. This means that,
if doing what we described yields no rows (e.g. because of a large row that doesn't fit
into the buffer), `read_some_rows` will grow the buffer or perform more reads until at least
one row has been read. If you're using the static interface, the number of read rows is limited
by the size of span you passed, too.
If you want to get the most of `read_some_rows`, customize the initial buffer size
to maximize the number of rows that each batch retrieves.
[endsect]