C++17 In Detail

18 March 2019

Dark Corner of C++ Corner Cases

Darker C++

The C ++ 17 standard consists of almost two thousands pages. Two thousand pages describing every single aspect of the language. Some pages relates to all kinds of details, exceptions, and things that you do not care about every day. We will try to look at a few such cases, which we hope never see in the production code.

This is a guest post from Wojciech Razik

Wojtek is a Senior C++ developer at Thaumatec, currently writing software for a robot. He enjoys reading C++ Standard before bed, and he loves to hate JS from dawn to dusk.
If you know Polish, take a look at cpp-polska.pl where Wojtek is one of the co-authors.

Originally published in Polish at cpp-polska.pl

Unfortunate Backward Compatibility

That’s right, unfortunate! The C++ Committee doesn’t like to change things that break backward compatibility. Actually, the community doesn't like it either. There are small exceptions - such as removing dangerous type std :: auto_ptr, or removingtrigraphs. Unfortunately, things that remember the beginnings of the language still exists in C ++.

It’s hard to think about a keyboard that doesn’t have a full set of characters. No # sign? Or braces? In the past, not all keyboards had a full set of characters compliant with the ISO 646 standard. And for developers, who don’t have that keyboard, a creature called digraphs was created. Let’s look at the following code snippet:

int main() {
    int a[] = <%1%>;
    return a<:0:>;
}

At first glance - incorrect syntax. But we paste the code into the Godbolt, and it turns out that the program is completely correct. Check it out: godbolt.org/z/S9L-RQ!

The code is correct because the corresponding pair of characters have a different representation. Everything is described in the standard:

Alternative Primary
<% {
%> }
<: [
:> ]
%: #
%:%: #

The above code, after replacing the alternatives to primary characters, looks like this:

int main() {
    int a[] = {1};
    return a[0];
}

You can now see that the program will return 1.

It’s Not What you Think

Not only the digraphs look like a mistake at first glance. Let’s see the following example:

#include <iostream>
int main() {
  std::cout << 1["ABC"];
}

And again - after all, it’s impossible to index POD types! Literals also don’t have overloaded [] operator. It’s impossible for the code to compile.

And yet - again, we paste the code to coliru, and we can see on the screen… B

No compilation errors, no warnings. We go to the standard, we read it from cover to cover and… Eureka!

(…) The expression E1 [E2] is identical (by definition) to * ((E1) + (E2)) (…)

The above expression is nothing else but:

* (1+ "ABC")

The addition operator is commutative so we can write this expression as:

* ( "ABC" +1)

ABC is nothing else than const char *, so it’s pointer arithmetics. Our expression is in fact:

 "ABC" [1]

That’s why it returns B.

Very Generic Code

Many things that look for us, have their rationales. They are in standard because someone suggested them and had a reason to do so. Let’s look a little bit closer to the destructor. Calling it like a simple method, without the keyword delete looks…. weird:

struct Foo {};

void clean(Foo* f)  { // bad design, but just for ilustration
  f->~Foo();          // we don't want to free the memory
}

Usually, we don’t want to do something like that, but it’s possible.
Weirder is to call such a destructor on a POD type. If we would like to call int destructor, we can try writing:

void clean(int* i) {
  i->~int(); // compilation error: expected identifier before `int`
}

The above code will not compile because it’s syntactically invalid. However, if we create the alias for the int type , code will be correct:

using MyInt = int;
void clean(MyInt* i) {
  i->~MyInt(); // OK
}

But why do we need it? It turns out that when creating our own container, in which we handle memory (e.g. we use a custom allocator), we can safely clean the contents of any container:

template<typename T>
struct C {
    // ...
    ~C() {
        for(size_t i = 0; i < elements_; ++i)
            container_[i].~T();
    }
};

Even if someone declares our container with a simple type, we don’t have to put on the wizard’s hat with big glowing SFINAE inscription. The code will compile and it will be working as expected. And what will a simple type destructor do?

Nothing. And thank God! The standard specifies this behaviour as pseudo destructor.

The Code Works the Way it’s Supposed to Work.

We all know how the switch statement looks like and how it works. In round brackets, we give some integer, char or enumeration type. In the case block we specify what our code should do for possible values. But it turns out that according to the standard, within the block switch we can write any expression, of which the statements case, break and default have a special meaning:

#include <iostream>

int main() {
    int n = 3;
    int i = 0;

    switch (n % 2) {
      case 0:
      do {
        ++i;
        case 1:
          ++i;
      } while (--n > 0);
    }
    std::cout << i;
}

The construction looks unusual, but of course it’s completely correct. It may look familiar to C programmers. There is a quite popular optimization called the Duff’s device.

After checking the condition, we will enter the do ... while loop to thecase 1 label. The first i increment will happen here. Then the whole loop will be executed twice, so program will output 5. In case if we had n = 5, result will be 9 (the first timei will be incremented after going to the label case 1, then the whole loop will be executed four times).

More Practically

In addition to the unusualness, there are also things that can kick us on daily basis. Let’s look at a fairly simple example, initializing a constant reference with a three-argument operator:

int main() {
    int i = 1;
    int const& a = i > 0 ? i : 1;
    i = 2;
    return a;
}

At first glance - the condition is satisfied: the variable a is a constant reference toi:

int const& a = i;

We are modifying the variable to which we have a reference, and … something is wrong here. The program returns 1. Godbolt cannot lie, Matt is not a guy that introduce pranks in the codebase. Once again, we read the standard from cover to cover, and finally: we find the appropriate paragraph: §7.6.16. This point precisely describing the three-argument operator. Our case does not meet any of the points 2-5 (it’s not a void, it’s not a class, etc …). So we go to point 6:

Otherwise, the result is a prvalue

What is prvalue? This is nothing but a temporary variable. So a will not be a reference to the variable i, but to the temporary variable. Why? Because the compiler takes into account both sides of a three-argument expression. lvalue on the left, prvalue on the right, that’s why deduced type is also prvalue

A similar thing happens when we are dealing with typecasting:

#include <iostream>

int main() {
    int a = '0';
    char const &b = a;
    std::cout << b;
    a++;
    std::cout << b;
}

Similarly to the above, the reference was initialized with a temporary variable resulting from the conversion of int tochar.

UB or Not UB?

At the end something completely useless, but again - clearly defined by the standard. Let’s try to initialize a variable using itself:

#include <iostream>

int main() {
  void *p = &p;
  std::cout << bool(p);
}

Is the code compiling? Yes, standard allows that:

The point of declaration for a complete solution before the complete declarator and before its initializer (if any)

Is the above code undefined behavior? Probably not, since it is in this article. Although we do not know what value &p will be, we know for sure that it won’t be zero (it cannot be initialized to NULL). So the code will print 1 on standard output.

Why all This?

The above examples show that the C ++ standard has many dark corners, which we are not always aware of. Are they really unnecessary? No one should ask about them on the job interview. For sure we will not use them regularly. Maybe we will never see most of them. But the moment will come: the compiler throws a strange error, or worse, we get a bug from the client. One quick look at the error and we can smile. Because we already know:

This is the chapter “Lexical convention”, paragraph §5.5.

That’s easy. Hold my coffee.

And you? Do you know any useless constructions that make C++ unnecessarily complicated?

Sources:

Get my free ebook about C++17!

More than 50 pages about the new Language Standard.

C++17 in detail, by Bartlomiej Filipek

C++17 In Detail
© 2017, Bartlomiej Filipek, Blogger platform
Any opinions expressed herein are in no way representative of those of my employers.
This site contains ads or referral links, which provide me with a commission. Thank you for your understanding.