1
0
mirror of https://github.com/CLIUtils/CLI11.git synced 2025-05-07 23:33:52 +00:00

Escape transform and docs (#970)

Update some documentation and add a string escape transformer so escaped
strings can be handled on the command line as well as in the config
files.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
Philip Top 2024-01-06 06:29:46 -08:00 committed by GitHub
parent 91101604d5
commit de1c6a1207
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
13 changed files with 152 additions and 48 deletions

View File

@ -451,8 +451,8 @@ Before parsing, you can set the following options:
This equivalent to calling `->delimiter(delim)` and `->join()`. Valid values
are `CLI::MultiOptionPolicy::Throw`, `CLI::MultiOptionPolicy::Throw`,
`CLI::MultiOptionPolicy::TakeLast`, `CLI::MultiOptionPolicy::TakeFirst`,
`CLI::MultiOptionPolicy::Join`, `CLI::MultiOptionPolicy::TakeAll`, and
`CLI::MultiOptionPolicy::Sum` 🆕.
`CLI::MultiOptionPolicy::Join`, `CLI::MultiOptionPolicy::TakeAll`,
`CLI::MultiOptionPolicy::Sum` 🆕, and `CLI::MultiOptionPolicy::Reverse` 🚧.
- `->check(std::string(const std::string &), validator_name="",validator_description="")`:
Define a check function. The function should return a non empty string with
the error message if the check fails
@ -702,6 +702,17 @@ filters on the key values is performed.
`CLI::FileOnDefaultPath(default_path, false)`. This allows multiple paths to
be chained using multiple transform calls.
- `CLI::EscapedString`: 🚧 can be used to process an escaped string. The
processing is equivalent to that used for TOML config files, see
[TOML strings](https://toml.io/en/v1.0.0#string). With 2 notable exceptions.
\` can also be used as a literal string notation, and it also allows binary
string notation see
[binary strings](https://cliutils.github.io/CLI11/book/chapters/config.html).
The escaped string processing will remove outer quotes if present, `"` will
indicate a string with potential escape sequences, `'` and \` will indicate a
literal string and the quotes removed but no escape sequences will be
processed. This is the same escape processing as used in config files.
##### Validator operations
Validators are copyable and have a few operations that can be performed on them
@ -873,9 +884,11 @@ through the `add_subcommand` method have the same restrictions as option names.
- `--subcommand1.subsub.f val` (short form nested subcommand option)
The use of dot notation in this form is equivalent `--subcommand.long <args>` =>
`subcommand --long <args> ++`. Nested subcommands also work `"sub1.subsub"`
would trigger the subsub subcommand in `sub1`. This is equivalent to "sub1
subsub"
`subcommand --long <args> ++`. Nested subcommands also work `sub1.subsub` would
trigger the subsub subcommand in `sub1`. This is equivalent to "sub1 subsub".
Quotes around the subcommand names are permitted 🚧 following the TOML standard
for such specification. This includes allowing escape sequences. For example
`"subcommand".'f'` or `"subcommand.with.dots".arg1 = value`.
#### Subcommand options
@ -1209,19 +1222,22 @@ option (like `set_help_flag`). Setting a configuration option is special. If it
is present, it will be read along with the normal command line arguments. The
file will be read if it exists, and does not throw an error unless `required` is
`true`. Configuration files are in [TOML][] format by default, though the
default reader can also accept files in INI format as well. It should be noted
that CLI11 does not contain a full TOML parser but can read strings from most
TOML files, including multi-line strings 🚧, and run them through the CLI11
parser. Other formats can be added by an adept user, some variations are
available through customization points in the default formatter. An example of a
TOML file:
default reader can also accept files in INI format as well. The config reader
can read most aspects of TOML files including strings both literal 🚧 and with
potential escape sequences 🚧, digit separators 🚧, and multi-line strings 🚧,
and run them through the CLI11 parser. Other formats can be added by an adept
user, some variations are available through customization points in the default
formatter. An example of a TOML file:
```toml
# Comments are supported, using a #
# The default section is [default], case insensitive
value = 1
value2 = 123_456 # a string with separators
str = "A string"
str2 = "A string\nwith new lines"
str3 = 'A literal "string"'
vector = [1,2,3]
str_vector = ["one","two","and three"]
@ -1229,6 +1245,7 @@ str_vector = ["one","two","and three"]
[subcommand]
in_subcommand = Wow
sub.subcommand = true
"sub"."subcommand2" = "string_value"
```
or equivalently in INI format

View File

@ -113,7 +113,9 @@ app.set_config("--config")
will read the files in the order given, which may be useful in some
circumstances. Using `CLI::MultiOptionPolicy::TakeLast` would work similarly
getting the last `N` files given.
getting the last `N` files given. The default policy for config options is
`CLI::MultiOptionPolicy::Reverse` which takes the last expected `N` and reverses
them so the last option given is given precedence.
## Configure file format
@ -204,14 +206,18 @@ str3 = """\
```
The key is that the closing of the multiline string must be at the end of a line
and match the starting 3 quote sequence.
and match the starting 3 quote sequence. Multiline sequences using `"""` allow
escape sequences. Following [TOML](https://toml.io/en/v1.0.0#string) with the
addition of allowing '\0' for a null character, and binary Strings described in
the next section. This same formatting also applies to single line strings.
Multiline strings are not allowed as part of an array.
### Binary Strings
Config files have a binary conversion capability, this is mainly to support
writing config files but can be used by user generated files as well. Strings
with the form `B"(XXXXX)"` will convert any characters inside the parenthesis
with the form \xHH to the equivalent binary value. The HH are hexadecimal
with the form `\xHH` to the equivalent binary value. The HH are hexadecimal
characters. Characters not in this form will be translated as given. If argument
values with unprintable characters are used to generate a config file this
binary form will be used in the output string.
@ -274,8 +280,8 @@ char arraySeparator = ',';
char valueDelimiter = '=';
/// the character to use around strings
char stringQuote = '"';
/// the character to use around single characters
char characterQuote = '\'';
/// the character to use around single characters and literal strings
char literalQuote = '\'';
/// the maximum number of layers to allow
uint8_t maximumLayers{255};
/// the separator used to separator parent layers
@ -296,8 +302,8 @@ These can be modified via setter functions
an array
- `ConfigBase *valueSeparator(char vSep)`: Specify the delimiter between a name
and value
- `ConfigBase *quoteCharacter(char qString, char qChar)` :specify the characters
to use around strings and single characters
- `ConfigBase *quoteCharacter(char qString, char literalChar)` :specify the
characters to use around strings and single characters
- `ConfigBase *maxLayers(uint8_t layers)` : specify the maximum number of parent
layers to process. This is useful to limit processing for larger config files
- `ConfigBase *parentSeparator(char sep)` : specify the character to separate
@ -410,3 +416,6 @@ will create an option name in following priority.
2. Positional name
3. First short name
4. Environment name
In config files the name will be enclosed in quotes if there is any potential
ambiguities in parsing the name.

View File

@ -26,18 +26,18 @@ app.add_option("-i", int_option, "Optional description")->capture_default_str();
You can use any C++ int-like type, not just `int`. CLI11 understands the
following categories of types:
| Type | CLI11 |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| number like | Integers, floats, bools, or any type that can be constructed from an integer or floating point number. Accepts common numerical strings like `0xFF` as well as octal, and decimal |
| string-like | std::string, or anything that can be constructed from or assigned a std::string |
| char | For a single char, single string values are accepted, otherwise longer strings are treated as integral values and a conversion is attempted |
| complex-number | std::complex or any type which has a real(), and imag() operations available, will allow 1 or 2 string definitions like "1+2j" or two arguments "1","2" |
| enumeration | any enum or enum class type is supported through conversion from the underlying type(typically int, though it can be specified otherwise) |
| container-like | a container(like vector) of any available types including other containers |
| wrapper | any other object with a `value_type` static definition where the type specified by `value_type` is one of the type in this list, including `std::atomic<>` |
| tuple | a tuple, pair, or array, or other type with a tuple size and tuple_type operations defined and the members being a type contained in this list |
| function | A function that takes an array of strings and returns a string that describes the conversion failure or empty for success. May be the empty function. (`{}`) |
| streamable | any other type with a `<<` operator will also work |
| Type | CLI11 |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| number like | Integers, floats, bools, or any type that can be constructed from an integer or floating point number. Accepts common numerical strings like `0xFF` as well as octal[\0755, or \o755], decimal, and binary(0b011111100), supports value separators including `_` and `'` |
| string-like | std::string, or anything that can be constructed from or assigned a std::string |
| char | For a single char, single string values are accepted, otherwise longer strings are treated as integral values and a conversion is attempted |
| complex-number | std::complex or any type which has a real(), and imag() operations available, will allow 1 or 2 string definitions like "1+2j" or two arguments "1","2" |
| enumeration | any enum or enum class type is supported through conversion from the underlying type(typically int, though it can be specified otherwise) |
| container-like | a container(like vector) of any available types including other containers |
| wrapper | any other object with a `value_type` static definition where the type specified by `value_type` is one of the type in this list, including `std::atomic<>` |
| tuple | a tuple, pair, or array, or other type with a tuple size and tuple_type operations defined and the members being a type contained in this list |
| function | A function that takes an array of strings and returns a string that describes the conversion failure or empty for success. May be the empty function. (`{}`) |
| streamable | any other type with a `<<` operator will also work |
By default, CLI11 will assume that an option is optional, and one value is
expected if you do not use a vector. You can change this on a specific option

View File

@ -129,10 +129,10 @@ class ConfigBase : public Config {
valueDelimiter = vSep;
return this;
}
/// Specify the quote characters used around strings and characters
ConfigBase *quoteCharacter(char qString, char qChar) {
/// Specify the quote characters used around strings and literal strings
ConfigBase *quoteCharacter(char qString, char literalChar) {
stringQuote = qString;
literalQuote = qChar;
literalQuote = literalChar;
return this;
}
/// Specify the maximum number of parents

View File

@ -218,6 +218,11 @@ class IPV4Validator : public Validator {
IPV4Validator();
};
class EscapedStringTransformer : public Validator {
public:
EscapedStringTransformer();
};
} // namespace detail
// Static is not needed here, because global const implies static.
@ -237,6 +242,9 @@ const detail::NonexistentPathValidator NonexistentPath;
/// Check for an IP4 address
const detail::IPV4Validator ValidIPV4;
/// convert escaped characters into their associated values
const detail::EscapedStringTransformer EscapedString;
/// Validate the input as a particular type
template <typename DesiredType> class TypeValidator : public Validator {
public:

View File

@ -9,8 +9,8 @@
// [CLI11:version_hpp:verbatim]
#define CLI11_VERSION_MAJOR 2
#define CLI11_VERSION_MINOR 3
#define CLI11_VERSION_PATCH 2
#define CLI11_VERSION "2.3.2"
#define CLI11_VERSION_MINOR 4
#define CLI11_VERSION_PATCH 0
#define CLI11_VERSION "2.4.0"
// [CLI11:version_hpp:end]

View File

@ -339,7 +339,11 @@ inline std::vector<ConfigItem> ConfigBase::from_config(std::istream &input) cons
item.pop_back();
}
if(keyChar == '\"') {
item = detail::remove_escaped_characters(item);
try {
item = detail::remove_escaped_characters(item);
} catch(const std::invalid_argument &ia) {
throw CLI::ParseError(ia.what(), CLI::ExitCodes::InvalidError);
}
}
} else {
if(lineExtension) {

View File

@ -229,10 +229,29 @@ CLI11_INLINE IPV4Validator::IPV4Validator() : Validator("IPV4") {
return std::string("Each IP number must be between 0 and 255 ") + var;
}
}
return std::string();
return std::string{};
};
}
CLI11_INLINE EscapedStringTransformer::EscapedStringTransformer() {
func_ = [](std::string &str) {
try {
if(str.size() > 1 && (str.front() == '\"' || str.front() == '\'' || str.front() == '`') &&
str.front() == str.back()) {
process_quoted_string(str);
} else if(str.find_first_of('\\') != std::string::npos) {
if(detail::is_binary_escaped_string(str)) {
str = detail::extract_binary_string(str);
} else {
str = remove_escaped_characters(str);
}
}
return std::string{};
} catch(const std::invalid_argument &ia) {
return std::string(ia.what());
}
};
}
} // namespace detail
CLI11_INLINE FileOnDefaultPath::FileOnDefaultPath(std::string default_path, bool enableErrorReturn)

View File

@ -50,7 +50,7 @@ TEST_CASE("file_fail") {
CLI::FuzzApp fuzzdata;
auto app = fuzzdata.generateApp();
int index = GENERATE(range(1, 6));
int index = GENERATE(range(1, 7));
auto parseData = loadFailureFile("fuzz_file_fail", index);
std::stringstream out(parseData);
try {

View File

@ -308,15 +308,6 @@ TEST_CASE("StringTools: binaryStrings", "[helpers]") {
CHECK(result == "\\XEM\\X7K");
}
/// these are provided for compatibility with the char8_t for C++20 that breaks stuff
std::string from_u8string(const std::string &s) { return s; }
std::string from_u8string(std::string &&s) { return std::move(s); }
#if defined(__cpp_lib_char8_t)
std::string from_u8string(const std::u8string &s) { return std::string(s.begin(), s.end()); }
#elif defined(__cpp_char8_t)
std::string from_u8string(const char8_t *s) { return std::string(reinterpret_cast<const char *>(s)); }
#endif
TEST_CASE("StringTools: escapeConversion", "[helpers]") {
CHECK(CLI::detail::remove_escaped_characters("test\\\"") == "test\"");
CHECK(CLI::detail::remove_escaped_characters("test\\\\") == "test\\");

View File

@ -706,6 +706,53 @@ TEST_CASE_METHOD(TApp, "NumberWithUnitBadInput", "[transform]") {
CHECK_THROWS_AS(run(), CLI::ValidationError);
}
static const std::map<std::string, std::string> validValues = {
{"test\\u03C0\\u00e9", from_u8string(u8"test\u03C0\u00E9")},
{"test\\u03C0\\u00e9", from_u8string(u8"test\u73C0\u0057")},
{"test\\U0001F600\\u00E9", from_u8string(u8"test\U0001F600\u00E9")},
{R"("this\nis\na\nfour\tline test")", "this\nis\na\nfour\tline test"},
{"'B\"(\\x35\\xa7\\x46)\"'", std::string{0x35, static_cast<char>(0xa7), 0x46}},
{"B\"(\\x35\\xa7\\x46)\"", std::string{0x35, static_cast<char>(0xa7), 0x46}},
{"test\\ntest", "test\ntest"},
{"\"test\\ntest", "\"test\ntest"},
{R"('this\nis\na\nfour\tline test')", R"(this\nis\na\nfour\tline test)"},
{R"("this\nis\na\nfour\tline test")", "this\nis\na\nfour\tline test"},
{R"(`this\nis\na\nfour\tline test`)", R"(this\nis\na\nfour\tline test)"}};
TEST_CASE_METHOD(TApp, "StringEscapeValid", "[transform]") {
auto test_data = GENERATE(from_range(validValues));
std::string value{};
app.add_option("-n", value)->transform(CLI::EscapedString);
args = {"-n", test_data.first};
run();
CHECK(test_data.second == value);
}
static const std::vector<std::string> invalidValues = {"test\\U0001M600\\u00E9",
"test\\U0001E600\\u00M9",
"test\\U0001E600\\uD8E9",
"test\\U0001E600\\uD8",
"test\\U0001E60",
"test\\qbad"};
TEST_CASE_METHOD(TApp, "StringEscapeInvalid", "[transform]") {
auto test_data = GENERATE(from_range(invalidValues));
std::string value{};
app.add_option("-n", value)->transform(CLI::EscapedString);
args = {"-n", test_data};
CHECK_THROWS_AS(run(), CLI::ValidationError);
}
TEST_CASE_METHOD(TApp, "NumberWithUnitIntOverflow", "[transform]") {
std::map<std::string, int> mapping{{"a", 1000000}, {"b", 100}, {"c", 101}};

View File

@ -71,6 +71,15 @@ inline void unset_env(std::string name) {
#endif
}
/// these are provided for compatibility with the char8_t for C++20 that breaks stuff
CLI11_INLINE std::string from_u8string(const std::string &s) { return s; }
CLI11_INLINE std::string from_u8string(std::string &&s) { return std::move(s); }
#if defined(__cpp_lib_char8_t)
CLI11_INLINE std::string from_u8string(const std::u8string &s) { return std::string(s.begin(), s.end()); }
#elif defined(__cpp_char8_t)
CLI11_INLINE std::string from_u8string(const char8_t *s) { return std::string(reinterpret_cast<const char *>(s)); }
#endif
CLI11_INLINE void check_identical_files(const char *path1, const char *path2) {
std::string err1 = CLI::ExistingFile(path1);
if(!err1.empty()) {

Binary file not shown.