C++17 In Detail

10 August 2020

How to Check String or String View Prefixes and Suffixes in C++20

Up to (and including) C++17 if you wanted to check the start or the end in a string you have to use custom solutions, boost or other third-party libraries. Fortunately, this changes with C++20.

See the article where I’ll show you the new functionalities and discuss a couple of examples.

Intro

Here’s the main proposal that was added into C++20:

std::string/std::string_view .starts_with() and .ends_with() P0457

In the new C++ Standard we’ll get the following member functions for std::string and std::string_view:

constexpr bool starts_with(string_view sv) const noexcept;
constexpr bool starts_with(CharT c ) const noexcept;
constexpr bool starts_with(const CharT* s ) const;

And also for suffix checking:

constexpr bool ends_with(string_view sv )const noexcept;
constexpr bool ends_with(CharT c ) const noexcept;
constexpr bool ends_with(const CharT* s ) const;

As you can see, they have three overloads: for a string_view, a single character and a string literal.

Simple example:

const std::string url { "https://isocpp.org" };

// string literals
if (url.starts_with("https") && url.ends_with(".org"))
    std::cout << "you're using the correct site!\n";

// a single char:
if (url.starts_with('h') && url.ends_with('g'))
    std::cout << "letters matched!\n";

You can play with this basic example @Wandbox

Token Processing Example

Below, you can find an example which takes a set of HTML tokens and extracts only the text that would be rendered on that page. It skips the HTML tags and leaves only the content and also tries to preserve the line endings.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
#include <vector>

int main() {
    const std::vector<std::string> tokens { 
        "<header>",
        "<h1>",
        "Hello World",
        "</h1>",
        "<p>",
        "This is my super cool new web site.",
        "</p>",
        "<p>",
        "Have a look and try!",
        "</p>",
        "</header>"
    };

    const auto convertToEol = [](const std::string& s) {
        if (s.starts_with("</h") || s.starts_with("</p"))
            return std::string("\n");

        return s;
    };

    std::vector<std::string> tokensTemp;
    std::transform(tokens.cbegin(), tokens.cend(),            
                   std::back_inserter(tokensTemp),
                   convertToEol);

    const auto isHtmlToken = [](const std::string& s) {
        return s.starts_with('<') && s.ends_with('>');
    };

    std::erase_if(tokensTemp, isHtmlToken); // cpp20!

    for (const auto& str : tokensTemp)
        std::cout << str;

    return 0;
}

You can play with the code at @Wandbox

The most interesting parts:

  • there’s a lambda convertToEol which takes a string and then returns the same string or converts that to EOL if it detects the closing HTML tag.
    • the lambda is then used in the std::transform call that converts the initial set of tokens into the temporary version.
  • later the temporary tokens are removed from the vector by using another predicate lambda. This time we have a simple text for an HTML token.
  • you can also see the use of std::erase_if which works nicely on our vector, this functionality is also new to C++20. There’s no need to use remove/erase pattern.
  • at the end we can display the final tokens that are left

Prefix and a (Sorted) Container

Let’s try another use case. For example, if you have a container of strings, then you might want to search for all elements that start with a prefix.

A simple example with unsorted vector:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
#include <string_view>
#include <vector>

int main() {
    const std::vector<std::string> names { "Edith", "Soraya", "Nenita",
        "Lanny", "Marina", "Clarine", "Cinda", "Mike", "Valentin",
        "Sylvester", "Lois", "Yoshie", "Trinidad", "Wilton", "Horace",
        "Willie", "Aleshia", "Erminia", "Maybelle", "Brittany", "Breanne"
        "Kerri", "Dakota", "Roseanna", "Edra", "Estell", "Fabian"
        "Arlen", "Madeleine", "Genia" }; // listofrandomnames.com

    const std::string_view prefix { "M" };
    const std::vector<std::string> foundNames = [&names, &prefix]{
        std::vector<std::string> tmp;
        std::copy_if(names.begin(), names.end(),
              std::back_inserter(tmp), [&prefix](const std::string& str){
                  return str.starts_with(prefix);
              });
        return tmp;
    }();

    std::cout << "Names starting with \"" << prefix << "\":\n";
    for (const auto& str : foundNames)
        std::cout << str << ", ";
}

Play with code @Wandbox

In the sample code, I’m computing the foundNames vector, which contains entries from names that starts with a given prefix. The code uses copy_if with a predicated that leverages the starts_wth() function.

On the other hand, if you want to have better complexity for this kind of queries, then it might be wiser to store those strings (or string views) in a sorted container. This happens when you have a std::map, std::set, or you sort your container. Then, we can use lower_bound to quickly (logarithmically) find the first element that should match the prefix and then perform a linear search for neighbour elements.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
#include <string_view>
#include <vector>
#include <set>

int main() {
    const std::set<std::string> names { "Edith", "Soraya", "Nenita",
        "Lanny", "Marina", "Clarine", "Cinda", "Mike", "Valentin",
        "Sylvester", "Lois", "Yoshie", "Trinidad", "Wilton", "Horace",
        "Willie", "Aleshia", "Erminia", "Maybelle", "Brittany", "Breanne"
        "Kerri", "Dakota", "Roseanna", "Edra", "Estell", "Fabian"
        "Arlen", "Madeleine", "Genia", "Mile", "Ala", "Edd" }; 
         // listofrandomnames.com

    const std::string prefix { "Ed" };
    const auto startIt = names.lower_bound(prefix);

    const std::vector<std::string> foundNames = [&names, &startIt, &prefix]{
        std::vector<std::string> tmp;
        for (auto it = startIt; it != names.end(); ++it)
            if ((*it).starts_with(prefix))
                tmp.emplace_back(*it);
            else
                break;

        return tmp;
    }();

    std::cout << "Names starting with \"" << prefix << "\":\n";
    for (const auto& str : foundNames)
        std::cout << str << ", ";
}

Play with the code @Wandbox

As a side note, you might also try a different approach which should be even faster. Rather than checking elements one by one starting from the lower bound iterator, we can also modify the last letter of the pattern in that way that it’s “later” in the order. And then you can also find lower_bound from that modified pattern. Then you have two ranges and better complexity (two log(n) searchers). I’ll leave that experiment for you as a “homework”.

Case (in)Sensitivity

All examples that I’ve shown so far used regular std::string objects and thus we could only compare strings case-sensitively. But what if you want to compare it case-insensitive?

For example, in boost there are separate functions that do the job:

In QT, similar functions take additional argument that selects the comparison technique (QString Class - starts_with).

In the Standard Library, we can do another way… and write your trait for the string object.

As you can recall std::string is just a specialisation of the following template:

template<class charT, 
         class traits = char_traits<charT>,
         class Allocator = allocator<charT>>
class basic_string;

The traits class is used for all core operations that you can perform on characters. You can implement a trait that compares strings case-insensitively.

You can find the examples in the following websites:

After implementing the trait you’ll end up with a string type that is different than std::string:

using istring = std::basic_string<char, case_insensitive_trait>;
// assuming case_insensitive_trait is a proper char trait

Is that a limitation? For example, you won’t be able to easily copy from std::string into your new istring. For some designs, it might be fine, but on the other hand, it can also be handy to have just a simple runtime parameter or a separate function that checks case-insensitively. What’s your opinion on that?

Another option is to “normalise” the string and the pattern - for example, make it lowercase. This approach, unfortunately, requires to create extra copies of the strings, so might not be the best.

Sorry for a little interruption in the flow :)
I've prepared a little bonus if you're interested in Modern C++, check it out here:

Compiler Support

Most of the recent compiler vendors already support the new functionality!

GCC Clang Visual Studio
9.0 9 VS 2019 16.1

Summary

In this article, you’ve seen how to leverage new functionality that we get with C++20: string prefix and suffix checking member functions.

You’ve seen a few examples, and we also discussed options if you want your comparisons to be case insensitive.

And you can read about other techniques of prefix and suffix checking in:

If you want to get additional C++ resources, exlusive articles, early access content, private Discord server and weekly curated news, check out my Patreon website: (see all benefits):

© 2017, Bartlomiej Filipek, Blogger platform
Disclaimer: Any opinions expressed herein are in no way representative of those of my employers. All data and information provided on this site is for informational purposes only. I try to write complete and accurate articles, but the web-site will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use.
This site contains ads or referral links, which provide me with a commission. Thank you for your understanding.