Table of Contents

Last week’s article about smaller C++17 features mentioned updated operator new() that handles non-standard alignment of objects. How does it work? Can you use it to ask for arbitrary alignments? Let’s try some code and have a closer look.

Last update: 9th September 2019

Why should you care about alignment?  

Let’s examine the first example:

#include <cassert>
#include <cstdint>
#include <iostream>
#include <malloc.h>
#include <new>

class alignas(32) Vec3d { 
    double x, y, z;
};

int main() {
    std::cout << "sizeof(Vec3d) is " << sizeof(Vec3d) << '\n';
    std::cout << "alignof(Vec3d) is " << alignof(Vec3d) << '\n';

    auto Vec = Vec3d{};
    auto pVec = new Vec3d[10];

    if(reinterpret_cast<uintptr_t>(&Vec) % alignof(Vec3d) == 0)
        std::cout << "Vec is aligned to alignof(Vec3d)!\n";
    else
        std::cout << "Vec is not aligned to alignof(Vec3d)!\n";

    if(reinterpret_cast<uintptr_t>(pVec) % alignof(Vec3d) == 0)
        std::cout << "pVec is aligned to alignof(Vec3d)!\n";
    else
        std::cout << "pVec is not aligned to alignof(Vec3d)!\n";

    delete[] pVec;
}

The code shows a structure - Vec3d that uses three double fields; it also marks the type with alignas that makes the objects aligned to 32 bytes.

Then the example creates two objects: one on the stack and one on the free store.

Do they both have the same alignment (32 bytes)?

And another question:

Should you care about the alignment of your memory allocations?

Let’s try to answer the second question first:

In general… in most of the cases… probably not :)

But you may need that for some CPU optimisations or general system requirements (for example some embedded environments, drivers, kernel code, or hardware-specific conditions).

In my experience, I used it for SIMD code that processed particles. I wanted my types to fit nicely in SSE2/AVX registers: Flexible Particle System - Code Optimisation.

Other needs for alignment, have a look at those questions/answers:

And please let me know in comments if you had to align your data in some non-standard way? I wonder how often programmers need to use this technique. Maybe it’s only 0.001% of C++ coders or 50%?

Returning to our code, let’s try to answer the first question about the alignment.

Let’s try C++11/14 with GCC 4.8.5: (See @Wandbox):

sizeof(Vec3d) is 32
alignof(Vec3d) is 32
Vec is aligned to alignof(Vec3d)!
pVec is not aligned to alignof(Vec3d)!

And how about C++17, for example GCC 9.1 (see @Wandbox)

izeof(Vec3d) is 32
alignof(Vec3d) is 32
Vec is aligned to alignof(Vec3d)!
pVec is aligned to alignof(Vec3d)!

What happened here?

In both compiler results, the alignment of objects on the stack is 32, as expected.

But for dynamic allocation it’s different:

In C++11 and C++14, there was no guarantee that memory allocated for types that are over-aligned honours that specific alignment. In our case we want Vec3d allocations to return pointers that are 32-byte aligned… but GCC 4.8.5 allocates differently.

How about C++17?

Now, in the newest standard, we have updated dynamic memory allocations, and now we have a guarantee that the memory will be aligned as requested.

As you see in GCC 9.1, the memory is now 32-byte aligned.

You can try other numbers, for example, try 64 bytes, 128, etc… but remember than alignment must be a power of two.

OK, but how does it work?

New new Functions  

In C++17, We have now 14 global new() function overloads and 8 class-specific methods!

Plus corresponding delete functions.

C++17 added overloads that have new parameter: std::align_val_t

It’s defined as follows:

enum class align_val_t : std::size_t {};

It uses a handy C++17 feature to enable initialisation of scoped enums with the underlying type. That’s why you can write:

align_val_t myAlignment { 32 }; // no need to cast to size_t!

And we have new() operators as below:

void* operator new  ( std::size_t count, std::align_val_t al);

See all of them here @cppreference

How does it work?

What’s the difference when you type:

auto p = new int{};

and

auto pVec = new Vec3{};

How does the compiler select the function overload? Does it always use overrides with alignment parameters?

Selecting new Functions  

By default, the popular compilers use 16-byte alignment. We can even check it because there’s now new predefined macro (since C++17):

__STDCPP_DEFAULT_NEW_ALIGNMENT__

MSVC, GCC and Clang specify it as 16.

Now, when you ask for memory allocation that requires alignment larger than this default value the compiler will use overloads with the proper alignment parameter.

It’s not possible to change the default value in MSVC (see this discussion): Add compiler switch to change __STDCPP_DEFAULT_NEW_ALIGNMENT___.

But on Clang there’s a compiler option:fnew-alignment.

Not sure about GCC though…

Custom Overloads  

As usual with operator new() you can also provide replaced implementation. For example:

void* operator new(std::size_t size, std::align_val_t align) {
#if defined(_WIN32) || defined(__CYGWIN__)
    auto ptr = _aligned_malloc(size, static_cast<std::size_t>(align));
#else
    auto ptr = aligned_alloc(static_cast<std::size_t>(align), size);
#endif

    if (!ptr)
        throw std::bad_alloc{};

    std::cout << "new: " << size << ", align: " 
              << static_cast<std::size_t>(align) 
              << ", ptr: " << ptr << '\n';

    return ptr;
}

void operator delete(void* ptr, std::size_t size, std::align_val_t align) noexcept {
    std::cout << "delete: " << size << ", align: " 
              << static_cast<std::size_t>(align) 
              << ", ptr : " << ptr << '\n';
#if defined(_WIN32) || defined(__CYGWIN__) 
    _aligned_free(ptr);
#else
    free(ptr);
#endif
}

void operator delete(void* ptr, std::align_val_t align) noexcept {
    std::cout << "delete: align: " 
              << static_cast<std::size_t>(align) 
              << ", ptr : " << ptr << '\n';
#if defined(_WIN32) || defined(__CYGWIN__)
    _aligned_free(ptr);
#else
    free(ptr);
#endif
}

And here’s some test code:

class alignas(32) Vec3dAVX { 
    double x, y, z;
};

int main() {
    std::cout << "__STDCPP_DEFAULT_NEW_ALIGNMENT__ is " 
              << __STDCPP_DEFAULT_NEW_ALIGNMENT__ << std::endl;

    std::cout << "sizeof(Vec3dAVX) is " << sizeof(Vec3dAVX) << '\n';
    std::cout << "alignof(Vec3dAVX) is " << alignof(Vec3dAVX) << '\n';
    auto pVec = new Vec3dAVX[10];
    assert(reinterpret_cast<uintptr_t>(pVec) % alignof(Vec3dAVX) == 0);
    delete[] pVec;

    auto p2 = new int[10];
    delete[] p2;
}

The output:

__STDCPP_DEFAULT_NEW_ALIGNMENT__ is 16
sizeof(Vec3dAVX) is 32
alignof(Vec3dAVX is 32
new: 320, align: 32, ptr: 0x2432e00
delete: align: 32, ptr : 0x2432e00

Play with the example @Wandbox

As you see the custom code was called for the allocation of Vec3dAVX, but not for int. This is because int used default alignment and it was smaller than __STDCPP_DEFAULT_NEW_ALIGNMENT__.

You can also try changing the alignment of Vec3dAVX from 32 into 16, and you’ll see that the custom code won’t be called.

And here’s a playground where you can change the code and play:

Requesting different alignment  

So far I showed you examples where types have alignment specified as alignas declaration. But in theory we can even ask for the alignment when calling placement new:

auto pAlignedInt = new(std::align_val_t{ 64 }) int[10];
delete[] pAlignedInt;

but now we got into troubles… at least on MSVC where I got the following error:

error C2956:  sized deallocation function 'operator delete(void*, size_t)' 
              would be chosen as placement deallocation function.

See this note Using c++17 new (std::align_val_t(n)) syntax results in error C2956.

on GCC there’s no warning… but maybe it’s wrong and unsafe? Do you know which delete function needs to be called to release the memory properly?

While we have placement new, there’s no placement delete. So to handle the deallocation properly, you need to call correct delete operator:

::operator delete(pAlignedInt, std::align_val_t{64});

What’s worse, now you also have to call the destructor for your objects! While the delete expression calls the destructor, it doesn’t happen with explicit call to delete function!

So for types that have constructor/destructors you need to call destructor first:

auto pAlignedType= new(std::align_val_t{ 32 }) MyType;
pAlignedType->~MyType();
::operator delete(pAlignedType, std::align_val_t{32});

It’s not as nice as you see, and you need to remember about the alignment used in the new expression and call the proper delete function. So maybe the error reported by MSVC is a good thing and can save you some bugs…

While memory allocated using std::aligned_alloc can be released using free() in MSVC it’s not supported, and you need to use _aligned_malloc() and _alined_free(). On Windows there’s a separate allocation mechanism for objects that use non-default alignments.

How Can it Simplify the Code?  

You can admit, the whole article is about quite low-level stuff. Maybe even not typical for most of the daily tasks.

What’s more, Modern C++ states that we shouldn’t even touch raw new and delete and rely on the standard containers or smart pointers.

So what’s the deal here?

In fact, the new new() allows us to stick to that rule even better!

I didn’t write about that in the initial article, but one of the readers made a valuable comment:

I thought we were not supposed to use “new” anymore?

You can also see this suggestion in the core guidelines:

C++ Core Guidelines - R.11: Avoid calling new and delete explicitly

And there was also one comment at r/cpp where an example from the Eigen library was mentioned.

Eigen: Using STL Containers with Eigen

If you’re compiling in [c++17] mode only with a sufficiently recent compiler (e.g., GCC>=7, clang>=5, MSVC>=19.12), then everything is taken care by the compiler and you can stop reading.

All in all, if you previously needed to use _aligned_malloc or custom allocators explicitly, now, you can clean-up the code and rely on the compiler!

Consider the following code which uses our previous example with 32-byte aligned Vec3dAVX class:

std::cout << "std::vector\n";
std::vector<Vec3dAVX> vec;
vec.push_back({});
vec.push_back({});
vec.push_back({});
assert(reinterpret_cast<uintptr_t>(vec.data()) % alignof(Vec3dAVX) == 0);

Play @Wandbox

And here’s the output I got (using our custom new/delete replacements)

new: 32, align: 32, ptr: 0xf1ec60
new: 64, align: 32, ptr: 0xf1ece0
delete: 32, align: 32, ptr : 0xf1ec60
new: 128, align: 32, ptr: 0xf1ed80
delete: 64, align: 32, ptr : 0xf1ece0
delete: 128, align: 32, ptr : 0xf1ed80

The code above creates a vector of aligned objects, and it will resize the container three times to accommodate three elements. Firstly it tries with only 32 bytes, then with 64 bytes and then 128 bytes (so four elements could be stored).

As you can see, the code also checks if the memory allocated internally by the vector is still aligned correctly. And it seems to work fine :)

Here are some other issues with “old” new():
* c++11 - Using STL vector with SIMD intrinsic data type - Stack Overflow
* c++ - Making std::vector allocate aligned memory - Stack Overflow

And now, in C++17, those problems are gone… for example you can hold specialized SIMD helper type __m256 in a vector:

std::vector<__m256> vec(10);
vec.push_back(_mm256_set_ps(0.1f, 0.2f, 0.3f, 0.4f, 0.5f, 0.6f, 0.7f, 0.8f));
assert(reinterpret_cast<uintptr_t>(vec.data()) % alignof(__m256) == 0);

In fact, the whole deal about the new functionality is that you can forgot about the limitation of over-aligned data. It lets you write regular modern C++ code without worrying about specialized allocators or raw memory handling functions like std::aligned_alloc() or _aligned_malloc().

Summary  

This article described the basic idea behind the new operator new() that can guarantee alignment for types that are “over aligned”.

Such technique might help with existing code that uses library functions like std::aligned_alloc or _aligned_malloc/_aligned_free() (for MSVC). And now rather than handling memory on your own you can rely on new/delete operators and still benefit from required alignment.

References:

The feature is available in GCC: 7.0, Clang: 4.0 and MSVC: 2017 15.5

Questions for you

  • Do you needed to work with non-standard memory alignment?
  • Can you spot all non-binary words in the logo image? :)