Table of Contents

If you have a map of strings, like std::map<std::string, int> m; and you want to find some element by m.find("abc"). Do you have to pay the price and construct a std::string object? Can you optimize it?

Let’s have a look at one feature enabled in C++14 that might help optimize such container access.

Intro  

Let’s expand the example mentioned earlier.

std::map<std::string, int> intMap { 
    { "Hello Super Long String", 1 }, 
    { "Another Longish String", 2 }, 
    { "This cannot fall into SSO buffer", 3 }
};

if (intMap.find("Hello Super Long String") != intMap.end())
    std::cout << "Found \n";
else
    std::cout << "Not found\n";

In the above code although “Hello Super Long String” is a string literal, it has to be converted into a regular std::string (so a memory allocation needed here), and then the search is performed.

The std::string supports comparing against const char*, so why we cannot use it here?

The reason: The definition of the comparator in the map (by default it’s std::less<Key>). It requires that you compare the same types. If you use std::string as a key, you can only compare with std::string, not even with something compatible.

Let’s have a look at a larger key for std::set. In that case the lookup cost might be even higher.

An Example of Larger Key  

How about a set container that stores products:

struct Product {
    std::string mName;
    std::string mDescription;
    double mPrice;
};

bool operator<(const Product& p1, const Product& p2) { 
    return p1.mName < p2.mName; 
}

std::set<Product> products {
    { "Car", "This is a super car that costs a lot", 100'000.0 },
    { "Ball", "A cheap but nice-looking ball to play", 100.0 },
    { "Orange", "Something to eat and refresh", 50.0 }
};

Products are compared by Name, which is a member variable.

If you want to find a “Car” then you need to create temporary Product and fill its name:

if (products.find({"Car", "", 0.0}) != products.end())
    std::cout << "Found\n"; 

But cannot we specify products.find("Car") and provide extra comparison options (comparing vs string_view for example)?

Side-note: Another reason for heterogeneous lookup might be when you have a set of movable only objects (one example is a set of unique_ptr). In that case, you cannot compare by creating temporary objects.

While it was not possible in C++11, we can do that by using heterogeneous lookup, available since C++14.

Heterogeneous Lookup, C++14  

Now, we can have a look at a possible improvement: heterogeneous lookup in ordered containers.

And surprisingly it’s straightforward to enable.

All you have to do is to use std::less<> (or some other functor, more on that later) and implement correct comparison functions!

For example for the first example with map of std::string:

std::map<std::string, int, std::less<>> intMap;

And now you can find by using const char* or string_view:

if (intMap.find("Hello Super Long String"))
    std::cout << "Found \n";
else
    std::cout << "Not found\n";

You can play with the code @Coliru.

Searching in std::set and Heterogeneous Lookup  

In the previous section, I showed implementation for a map of strings, now let’s cover the example with a set of Products. In this case, the key is much larger.

Let’s create an implementation that compares products via string_view.

bool operator<(const Product& prod, const std::string_view& sv) { 
    return prod.mName < sv; 
}
bool operator<(const std::string_view& sv, const Product& prod) { 
    return sv < prod.mName; 
}

And now we can search:

std::set<Product, std::less<>> products { ... };

if (products.find(std::string_view("Car")) != products.end())
    std::cout << "Found \n";
else
    std::cout << "Not found\n";

Great! We can search for products by their name without creating temporary objects

How is Heterogeneous Lookup Implemented?  

You know how to use such this new search pattern, but how is it implemented?

What’s the difference between those two lines:

std::map<std::string, int> myMap;
std::map<std::string, int, std::less<>> myOtherMap;

The first thing is that myMap declaration resolves to

std::map<std::string, int, std::less<std::string>> myMap; 
// allocator omitted above...

The full declaration is as follows:

template<class Key, class T,
    class Compare = std::less<Key>,
    class Allocator = std::allocator<std::pair<const Key, T> >
> class map;

Note: the text refers to std::less, but the rules apply to all standard functors like std::greater, std::plus, etc, etc. And your custom functors as well.

The design choice for heterogeneous lookup suggested using the existing syntax as much as possible, without the need to invent some new extra names (like Greater vs greater).

std::less has operator () defined as follows:

template <class _Ty = void>
struct less {
    constexpr bool operator()(const _Ty& _Left, const _Ty& _Right) const {
        return _Left < _Right;
    }
};

The type must be the same for _Left and _Right.

The solution was to specialize std::less for empty (void) and also enhance it with `is_transparent” property.

Now we can define a template method (rather than a type) that uses two different (but compatible) types:

template <>
struct less<void> { 
    using is_transparent = int;

    // simplified version...
    template <class _Ty1, class _Ty2>
    constexpr auto operator()(_Ty1&& _Left, _Ty2&& _Right) const
        return static_cast<_Ty1&&>(_Left) < static_cast<_Ty2&&>(_Right);
    }
};

Now _Left and _Right can be distinct types, but they need to be comparable.

The find method overload can be defined as:

template <class _Other, class _Mycomp = key_compare, 
          class = typename _Mycomp::is_transparent>
iterator find(const _Other& _Keyval) { ... }

In other words, if the comparator is transparent (through having is_transparent tag), then the implementation can leverage heterogeneous lookup.

You can also implement your custom functions that expose is_transparent. There was even a similar article on that at fluentcpp: is_transparent: How to search a C++ set with another type than its key - Fluent C++.

You can read more about the feature in the proposals that were accepted in C++14: Making Operator Functors greater<> N3421 and Adding heterogeneous comparison lookup to associative containers - N3657.

One catch - don’t search using a different key  

Ordered containers are implemented as balanced trees. The order is specified by the key you provide in the container declaration. If you try to search for another key, the search might fail.

For example, for our std::set<Product> case you might be tempted to search by the price:

You need to add comparison functions:

bool operator<(const Product& prod, const double& price) { 
    return prod.mPrice < price; 
}
bool operator<(const double& price, const Product& prod) { 
    return price < prod.mPrice; 
}

And then the code:

std::set<Product, std::less<>> products {
    { "Car", "This is a super car that costs a lot", 100'000.0 },
    { "Ball", "A cheap but nice-looking ball to play", 100.0 },
    { "Orange", "Something to eat and refresh", 50.0 }
};

std::cout << "Lookup by Price: \n";
if (products.find(50.0) != products.end())
    std::cout << "Found \n";
else
    std::cout << "Not found\n";

The output:

Not Found

There is an object that has the price of 50 units… so why the search failed?

The primary key that we use here is the name. The implementation might create the following tree structure:

       "Ball"
     /      \
   "Car"    "Orange" 

When comparing 50.0 with “Ball”, we compare the prices, and 50 is smaller than Ball’s price of 100.0. So we go into the left subtree. Then we see only “Car”, which has a different price than “50”.

Maybe that’s quite obvious, but be sure to look for keys that are also equal to the primary key that is used.

What’s Coming in C++20?  

In C++14 we got heterogeneous lookup for ordered containers (std::map, std::set, etc) and the natural extension was to have a similar approach for unordered containers (std::unorederd_map, std::unordered_set, etc).

If everything goes fine, we’ll have that in C++20 through the paper: P0919 by Mateusz Pusz. Right now, the paper was accepted for the C++20 draft.

You can also try your implementation and use the ideas from this video.
https://www.youtube.com/watch?v=0QFPKgvLhao

The Performance Gains with Heterogeneous Lookup  

One of the reasons we have heterogeneous lookup is to increase the performance of searching. But how much you can achieve?

The main gain will come from reducing the number of temp objects and extra memory allocations. So the less temp memory you need to allocate the better is the final boost.

We can draw some numbers from the paper P0919 where the author - Mateusz - presents several experiments for unordered containers (Github repo here: mpusz/unordered_v2):

  • 20% performance gain for short text (SSO used in std::string temporary).
  • 35% performance gain for long text (dynamic memory allocation in std::stringtemporary).

Can we get the same performance with ordered containers? I hope to cover that in my next article. So stay tuned. But if you have some results already, please share that in comments.

Summary  

With C++14 we got a new and flexible way to lookup in ordered containers. The main idea was to provide “transparent” functors that can compare two “compatible” objects that represent a key. For example, in a map of strings, you can search by string_view or const char*. That reduced the number of temp objects. This technique is also handy when your keys are large.

In C++20 we’ll probably get a similar pattern but for unordered containers. We need to wait for the final Standard.

Have you used heterogeneous lookup before? Do you think that might help in your projects? Let us know in comments.