C++17 In Detail

12 May 2012

Float vs Double

See my new website at cppstories.com

What is better: double or float? For a long time, I've been simply using floats - I thought they're faster and smaller than doubles... it is also an obvious choice in graphics programming.

But what about doubles? Are they that bad? It seems that the answers is not that obvious!

The test

I've written some code:

  • Allocate ARR_SIZE numbers
  • Initialize elements with a simple pattern
  • Compute some value, use different operations
 // test float: 
float *floatArray = (float *)malloc(ARR_SIZE * sizeof(float)); 
for (int i = 0; i < ARR_SIZE; ++i) 
    floatArray[i] = (float)(i*i)/100.0f; 
for (int i = 0; i < ARR_SIZE; ++i) 
    float temp = 0.0f; 
    for (int j = 0; j < NUM_ITER; ++j) 
        temp += floatArray[j]*2.0f; 
    temp = sqrtf(temp); 
    floatArray[i] = temp; 



And the double code:

// test double: 
double *doubleArray = (double *)malloc(ARR_SIZE * sizeof(double)); 
for (int i = 0; i < ARR_SIZE; ++i) 
    doubleArray[i] = (double)(i*i)/100.0; 
for (int i = 0; i < ARR_SIZE; ++i) 
    double temp = 0.0; 
    for (int j = 0; j < NUM_ITER; ++j) 
        temp += doubleArray[j]*2.0; 
    temp = sqrt(temp); 
    doubleArray[i] = temp; 



The Results

Core 2 Duo T7300 @2.0Ghz
Visual Studio 2008, Release, /Ox, /fp:precise

processing float: 308 msec 
processing double: 92 msec 

Release, /Ox, /fp:precise, /arch:SSE2

processing float: 307 msec 
processing double: 96 msec 

Release, /Ox, /fp:fast, /arch:SSE2

processing float: 111 msec 
processing double: 93 msec

Wow... what a huge difference between standard version and SSE2! And moreover it seems that double type is sometimes even triple times faster that single precision! Worth considering... and worth more and proper testing!

The Reason

the main problem: conversions

Below there is asm code generated by VS (Release, /Ox, /fp:precise, /arch:SSE2):

// for float 
; 35 : for (int j = 0; j < NUM_ITER; ++j)
; 36 : { ; 37 : temp += floatArray[j]*2.0f; 
movss xmm3, 
DWORD PTR [eax-8] cvtps2pd xmm3, 
xmm3 cvtss2sd xmm1, 
xmm1 mulsd xmm3, 
xmm0 addsd xmm3, 
xmm1 xorps xmm1, 
xmm1 cvtpd2ps xmm1, 
xmm3 movss xmm3, 
DWORD PTR [eax-4] ... 
// for double 
; 59 : for (int j = 0; j < NUM_ITER; ++j) 
; 60 : { ; 61 : temp += doubleArray[j]*2.0; 
movsd xmm3, 
QWORD PTR [eax-16] mulsd xmm3,
xmm0 addsd xmm3, 
xmm1 movsd xmm1, 
QWORD PTR [eax-8] ... 

Listing for floats is longer because of cvtps2pd and cvtss2sd instructions that converts single floating point value into double precision floating point value... and the reverse way.

Floating point calculations are usually performed using Double precision (or even 80 bit precision ). The difference is seen only in Release version, for Debug I got: 317 for float and 315 for double.

If you want to get additional C++ resources, exlusive articles, early access content, private Discord server and weekly curated news, check out my Patreon website: (see all benefits):

© 2017, Bartlomiej Filipek, Blogger platform
Disclaimer: Any opinions expressed herein are in no way representative of those of my employers. All data and information provided on this site is for informational purposes only. I try to write complete and accurate articles, but the web-site will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use.
This site contains ads or referral links, which provide me with a commission. Thank you for your understanding.