Table of Contents

When you’re doing a code profiling session it’s great to have advanced and easy to use tools. But what if we want to do some simple test/benchmark? Maybe a custom code would do the job?

Let’s have a look at simple performance timer for C++ apps.

Intro  

A task might sound simple: detect what part of the code in the ABC module takes most of the time to execute. Or another case: compare execution time between the Xyz algorithm and Zyx.

Sometimes, instead of using and setting up advanced profiling tools I just use my custom profiling code. Most of the time I only need a good timer and a method to print something on the screen/output. That’s all. Usually it can be enough for most of cases… or at least a good start for more deep and advanced profiling session.

Little Spec  

What do we want?

  • I’d like to measure execution time of any function in my code and even part of a routine.
  • The profiling code that needs to be added to routines must be very simple, ideally just one line of additional code.
  • There should be a flag that will disable/enable profiling globally

Timer  

Good timer is the core of our mechanism.

Here is a brief summary of available options:

  • RDTSC instruction - it returns number of CPU cycles since the reset, 64 bit variable. Using this instruction is very low-level, but probably this is not what we need. Cpu cycles aren’t steady time events: power saving, context switching… See an interesting read from RandomAscii: rdtsc in the Age of Sandybridge.
  • High performance timer on Windows - see Acquiring high-resolution time stamps. It gives highest possible level of precision (<1us).
  • GetTickCount - 10 to 16 milliseconds of resolution
  • timeGetTime - uses system clock (so the same resolution as GetTickCount), but resultion can be increased up to even 1ms (via timeBeginPeriod). See full comparision between GetTickCount vs timeGetTime at RandomASCII blog
  • std::chrono - finally, there are timers from STL library!
    • system_clock - system time
    • steady_clock - monotonic clock, see the diff between system_clock at this SO question
    • high_resolution_clock - highest possible resolution, multiplatform! Warning: it might be alias for system or steady clock… depending on the system capabilities.

Obviously we should generally use std::high_resolution_clock, unfortunately it’s not working as expected in VS2013 (where I developed the original solution).
This is fixed in VS 2015: see this blog post from vs team.

In general, if you’re using the latest compilers/libraries then std::chrono will work as expected. If you have some older tools, then it’s better to double-check.

Output  

Where do we want to get the results? In simple scenarios we might just use printf/cout. Other option is to log directly to some log file or use Debug View.

Performance cost  

Measuring some effect can alter the results. How much elapsed time is affected by our profiling code? If it takes proportionally quite a long time (against the code that we measure) we might need to defer the process somehow.

For example if I want to measure execution time for just a small method that runs in a few microseconds, writing output to a file (each time the method is called) might be longer then the whole function!

So we can measure just the elapsed time (assuming that it’s very fast) and defer the process of writing the data to output.

Solution  

Simple as it is:

void longFunction()
{
    SIMPLEPERF_FUNCSTART;

    SIMPLEPERF_START("loop ");
    for (int i = 0; i < 10; ++i)
    {
        SIMPLEPERF_SCOPED("inside loop ");
        //::Sleep(10);
        internalCall();
    }
    SIMPLEPERF_END;
}

which shows at the end of the program:

main : 14837.797000
  longFunction : 0.120000
    loop  : 0.109000
      inside loop  : 0.018000
        internalCall : 0.008000
      inside loop  : 0.011000
        internalCall : 0.009000
      ...
      inside loop  : 0.005000
        internalCall : 0.002000
  shortMethod : 15.226000
    loop  : 15.222000

We have 3 basic macros that can be used:
* SIMPLEPERF_FUNCSTART - just put it at the beginning of the function/method. It will show the name of the function and print how long it took to execute
* SIMPLEPERF_SCOPED(str) - place it at the beginning of a scope
* SIMPLEPERF_START(str) - place it inside a function, as a custom marker, where you don’t have a scope opened.
* SIMPLEPERF_END - need to close SIMPLEPERF_START
* Plus:
* add #include "SimplePerfTimer.h
* enable it by setting #define ENABLE_SIMPLEPERF (also in SimplePerfTimer.h for simplicity)

Additionally the code supports two modes:

  • Immediate: will print just after the elapsed time is obtained. Printing might affect some performance.
  • Retained: will collect the data so that it can be shown at the end of the program.

In retained mode we can call:

  • SIMPLEPERF_REPORTALL - show the current data
  • SIMPLEPERF_REPORTALL_ATEXIT - will show the data but after main() is done. Can be called any time in the program actually.

The flag #define SIMPLEPERF_SHOWIMMEDIATE true need to be set to true to use retained mode.

Problems

The whole timer might not work in multicore, multithreaded code since it does not use any critical sections to protected shared data, or doesn’t care about thread that the code is running. If you need more advanced timer then you will be interested in article at Preshing on Programming: A C++ Profiling Module for Multithreaded APIs.

Implementation details  

github repo: github.com/fenbf/SimplePerfTimer

The core idea for the timer is to use destructor to gather the data. This way when some timer object goes out of the scope we’ll get the data. This is handy especially for whole functions/explicit scopes.

{ // scope start
   my_perf_timer t;
}

In a basic immediate form the timer just saves time (using QueryPerformanceCounter) in the constructor and then in the destructor measures end time and prints it to the output.

In the retained mode we also need to store that data for the future use. I simply create a static vector that adds a new entry in the constructor and then fills the final time in the destructor. I also take care about indents, so that the output looks nicely.

In the repo there is also a header only version (a bit simplified, using only immediate mode): see SimplePerfTimerHeaderOnly.h

Here is a picture showing timer results in Debug view:

Todo

  • Add file/line information when printing the data?
  • Use std::chrono for VS2015/GCC version

Summary  

This post described a handy performance timer. If you just need to check execution time of some code/system just include a header (+and add related .cpp file) and use SIMPLEPERF_FUNCSTART or SIMPLEPERF_START(str)/END in analysed places. The final output should help you find hotspots… all without using advanced tools/machinery.

Once again the repo: github.com/fenbf/SimplePerfTimer

Resources