Partners: KDAB Whole Tomato Software CppDepend

05 February 2018

Factory With Self-Registering Types

Factory with self registering types

Writing a factory method might be simple:

unique_ptr<IType> create(name) {
    if (name == "Abc") return make_unique<AbcType>();
    if (name == "Xyz") return make_unique<XyzType>();
    if (...) return ...

    return nullptr;
}

Just one switch/if and then after a match you return a proper type.

But what if we don’t know all the types and names upfront? Or when we’d like to make such factory more generic?

Let’s see how classes can register themselves in a factory and what are the examples where it’s used.

Intro

The code shown as the example at the beginning of this text is not wrong when you have a relatively simple application. For example, in my experiments with pimpl, my first version of the code contained:

static unique_ptr<ICompressionMethod> 
Create(const string& fileName)
{
    auto extension = GetExtension(filename);
    if (extension == "zip")
        return make_unique<ZipCompression>();
    else if (extension = "bz")
        return make_unique<BZCompression>();

    return nullptr;
}

In the above code, I wanted to create ZipCompression or BZCompression based on the extensions of the filename.

That straightforward solution worked for me for a while. Still, if you want to go further with the evolution of the application you might struggle with the following issues:

  • Each time you write a new class, and you want to include it in the factory you have to add another if in the Create() method. Easy to forget in a complex system.
  • All the types must be known to the factory
  • In Create() we arbitrarily used strings to represent types. Such representation is only visible in that single method. What if you’d like to use it somewhere else? Strings might be easily misspelt, especially if you have several places where they are compared.

So all in all, we get strong dependency between the factory and the classes.

But what if classes could register themselves? Would that help?

  • The factory would just do its job: create new objects based on some matching.
  • If you write a new class there’s no need to change parts of the factory class. Such class would register automatically.

It sounds like an excellent idea.

A practical example

To give you more motivation I’d like to show one real-life example:

Google Test

When you use Google Test library, and you write:

TEST(MyModule, InitTest)
{
}

Behind this single TEST macro a lot of things happen!

For starters your test is expanded into a separate class - so each test is a new class.

But then, there’s a problem: you have all the tests, so how the test runner knows about them?

It’s the same problem were’ trying to solve in this post. The classes need to be registered.

Have a look at this code: from googletest/…/gtest-internal.h:

// (some parts of the code cut out)
#define GTEST_TEST_(test_case_name, test_name, parent_class, parent_id)\
class GTEST_TEST_CLASS_NAME_(test_case_name, test_name) \
: public parent_class \
{\
  virtual void TestBody();\
  static ::testing::TestInfo* const test_info_ GTEST_ATTRIBUTE_UNUSED_;\
};\
\
::testing::TestInfo* const GTEST_TEST_CLASS_NAME_(test_case_name, test_name)\
  ::test_info_ =\
    ::testing::internal::MakeAndRegisterTestInfo(\
        #test_case_name, #test_name, NULL, NULL, \
        new ::testing::internal::TestFactoryImpl<\
            GTEST_TEST_CLASS_NAME_(test_case_name, test_name)>);\
void GTEST_TEST_CLASS_NAME_(test_case_name, test_name)::TestBody()

I cut some parts of the code to make it shorter, but basically GTEST_TEST_ is used in TEST macro and this will expand to a new class. In the lower section, you might see a name MakeAndRegisterTestInfo. So here’s the place where the class registers!

After the registration, the runner knows all the existing tests and can invoke them.

When I was implementing a custom testing framework for one of my projects I went for a similar approach. After my test classes were registered, I could filter them, show their info and of course be able to execute the test suits.

I believe other testing frameworks might use a similar technique.

Flexibility

My previous example was related to unknown types: for tests, you know them at compile time, but it would be hard to list them in one method create.

Still, such self-registration is useful for flexibility and scalability. Even for my two classes: BZCompression and ZipCompression.

Now when I’d like to add a third compression method, I just have to write a new class, and the factory will know about it - without much of intervention in the factory code.

Ok, ok… we’ve discussed some examples, but you probably want to see the details!

So let’s move to the actual implementation.

Self-registration

What do we need?

  • Some Interface - we’d like to create classes that are derived from one interface. It’s the same requirement as a “normal” factory method.
  • Factory class that also holds a map of available types
  • A proxy that will be used to create a given class. The factory doesn’t know how to create a given type now, so we have to provide some proxy class to do it.

For the interface we can use ICompressionMethod:

class ICompressionMethod
{
public:
    ICompressionMethod() = default;
    virtual ~ICompressionMethod() = default;

    virtual void Compress() = 0;
};

And then the factory:

class CompressionMethodFactory
{
public:
    using TCreateMethod = unique_ptr<ICompressionMethod>(*)();

public:
    CompressionMethodFactory() = delete;

    static bool Register(const string name, TCreateMethod funcCreate);

    static unique_ptr<ICompressionMethod> Create(const string& name);

private:
    static map<string, TCreateMethod> s_methods;
};

The factory holds the map of registered types. The main point here is that the factory uses now some method (TCreateMethod) to create the desired type (this is our proxy). The name of a type and that creation method must be initialized in a different place.

The implementation of such factory:

map<string, TCreateMethod> CompressionMethodFactory::s_methods;

bool CompressionMethodFactory::Register(const string name, 
                                        TCreateMethod& funcCreate)
{
    if (auto it = s_methods.find(name); it == s_methods.end())
    { // C++17 init-if ^^
        s_methods[name] = funcCreate;
        return true;
    }
    return false;
}

unique_ptr<ICompressionMethod> 
CompressionMethodFactory::Create(const string& name)
{
    if (auto it = s_methods.find(name); it != s_methods.end()) 
        return it->second(); // call the createFunc

    return nullptr;
}

Now we can implement a derived class from ICompressionMethod that will register in the factory:

class ZipCompression : public ICompressionMethod
{
public:
    virtual void Compress() override;

    static unique_ptr<ICompressionMethod> CreateMethod() { 
        return smake_unique<ZipCompression>();
    }
    static std::string GetFactoryName() { return "ZIP"; }

private:
    static bool s_registered;
};

The downside of self-registration is that there’s a bit more work for a class. As you can see we have to have a static CreateMethod defined.

To register such class all we have to do is to define s_registered:

bool ZipCompression::s_registered =
  CompressionMethodFactory::Register(ZipCompression::GetFactoryName(),   
                                     ZipCompression::CreateMethod);

The basic idea for this mechanism is that we rely on static variables. They will be initialized before main() is called.

But can we be sure that all of the code is executed, and all the classes are registered? s_registered is not used anywhere later, so maybe it could be optimized and removed? And what about the order of initialization?

Static var initialization

We might run into two problems:

Order of static variables initialization:

It’s called “static initialization order fiasco” - it’s a problem where one static variable depends on another static variable. Like static int a = b + 1 (where b is also static). You cannot be sure b will be initialized before a. Bear in mind that such variables might be in a different compilation unit.

Fortunately, for us, it doesn’t matter. We might end up with a different order of elements in the factory container, but each name/type is not dependent on other already registered types.

But what about the first insertion? Can we be sure that the map is created and ready for use?

To be certain I’ve even asked a question at SO:
C++ static initialization order: adding into a map - Stack Overflow

Our map is defined as follows:

map<string, TCreateMethod> CompressionMethodFactory::s_methods;

And that falls into the category of Zero initialization. Later, the dynamic initialization happens - in our case, it means all s_registered variables are inited.

So it seems we’re safe here.

You can read more about it at isocpp FAQ and at cppreference - Initialization.

Can s_registered be eliminated?

Fortunately, we’re also on the safe side:

From the latest draft of C++: n4713.pdf [basic.stc.static], point 2:

variable with static storage duration has initialization or a destructor with side effects; it shall not be eliminated even if it appears to be unused.

So the compiler won’t optimize such variable.

Although this might happen when we use some templated version… but more on that later.

Update: and read what can happen when your symbols comes from a static library: my newest post: Static Variables Initialization in a Static Library, Example

Extensions

All in all, it seems that our code should work! :)

For now, I’ve only shown a basic version, and we can think about some updates:

Proxy classes

In our example, I’ve used only a map that holds <name, TCreateMethod - this works because all we need is a way to create the object.

We can extend this and use a “full” proxy class that will serve as “meta” object for the target type.

In my final app code I have the following type:

struct CompressionMethodInfo
{
    using TCreateMethod = std::unique_ptr<ICompressionMethod>(*)();
    TCreateMethod m_CreateFunc;
    string m_Description;
};

Beside the creation function, I’ve added m_Description. This addition enables to have a useful description of the compression method. I can then show all that information to the user without the need to create real compression methods.

The factory class is now using

static map<string, CompressionMethodInfo> s_methods;

And when registering the class, I need to pass the info object, not just the creation method.

bool ZipCompression::s_registered =
  CompressionMethodFactory::Register(
      ZipCompression::GetFactoryName(), 
      { ZipCompression::CreateMethod, 
        "Zip compression using deflate approach" 
      });

Templates

As I mentioned the downside of self-registration is that each class need some additional code. Maybe we can pack it in some RegisterHelper<T> template?

Here’s some code (with just creation method, not with the full info proxy class):

template <typename T>
class RegisteredInFactory
{
protected:
    static bool s_bRegistered;
};

template <typename T>
bool RegisteredInFactory<T>::s_bRegistered = 
CompressionMethodFactory::Register(T::GetFactoryName(), T::CreateMethod);

The helper template class wraps s_bRegistered static variable and it registers it in the factory. So now, a class you want to register just have to provide T::GetFactoryName and T::CreateMethod:

class ZipCompression : public ICompressionMethod, 
                       public RegisteredInFactory<ZipCompression>
{
public:
    virtual void Compress() override { /*s_bRegistered;*/ }

    static unique_ptr<ICompressionMethod> CreateMethod() { ... }
    static std::string GetFactoryName() { return "ZIP"; }
};

Looks good… right?

But when you run it the class is not being registered!

Have a look at this code @coliru.

But if you uncomment /*s_bRegistered*/ from void Compress() the registration works fine.

Why is that?

It seems that although s_bRegistered is also a static variable, it’s inside a template. And templates are instantiated only when they are used (see odr-use @stackoverlow). If the variable is not used anywhere the compiler can remove it…

Another topic that’s worth a separate discussion.

So all in all, we have to be smarter with the templated helper. I’ll have to leave it for now.

Not using strings a name

I am not happy that we’re still using string to match the classes.

Still, if used with care strings will work great. Maybe they won’t be super fast to match, but it depends on your performance needs. Ideally, we could think about unique ids like ints, hashes or GUIDs.

Some articles to read and extend

Summary

In this post, I’ve covered a type of factory where types register themselves. It’s an opposite way of simple factories where all the types are declared upfront.

Such approach gives more flexibility and removes dependency on the exact list of supported classes from the factory.

The downside is that the classes that want to register need to ask for it and thus they need a bit more code.

Let me know what do you think about self-registration? Do you use it in your projects? Or maybe you have some better ways?

Get my free ebook about C++17!

More than 50 pages about the new Language Standard.

C++17 in detail, by Bartlomiej Filipek

For now I don't have my own courses, but I promote others :) (Please note, I'll also get a little commission for every signup. That's a huge support for my work!). Have a look my recommended C++ courses at @Pluralsight (more info in my Resource page):

© 2017, Bartlomiej Filipek, Blogger platform
Any opinions expressed herein are in no way representative of those of my employers.
This site contains ads or referral links, which provide me with a commission. Thank you for your understanding.