Problem: float +/ result depend on computation order, Win32/x64, compiler options 

Author 
Message 
Pieneer

Posted: Visual C++ Language, Problem: float +/ result depend on computation order, Win32/x64, compiler options 
Top 
Hi, I have this small test program test.cpp:
#include <iostream> using namespace std; int main() { cout.precision(8); float v1= 1.9513026f, v2= 0.31476471f, v3= 3.1415927f; cout << "v1v2+v3= " << v1v2+v3 << endl; cout << "v1+v3v2= " << v1+v3v2 << endl; return 0; }
When I compile it (using Visual Studio 2005 SP1) with cl /EHsc test.cpp as a Win64 application (both on Windows XP x64 and Vista x64) and run it I get this result:
v1v2+v3= 0.87552547 v1+v3v2= 0.87552536
So, depending on the order of computations I get different results. I know that float has only 67 significant digits, so some inaccuracy is to be expected (using double in stead of float gives the result 0.87552539), but my problem is that I have to compare the results of the program to results of the same program compiled as a Win32 application, which gives:
v1v2+v3= 0.87552536 v1+v3v2= 0.87552536
If I compile with cl /clr test.cpp I get all 0.87552536 both for Win64 and Win32, so this raises the question if the different result (0.87552547) is a bug of the compiler Could someone please verify this with another processor. I have AMD Athlon 64 X2 4200+.
Unfortunately I can't use /clr or double in this case, but I have to use /EHsc, /MT and float in order to speed up the computation and save memory (the real application handles lots of data). Is there any way of getting the same results for the Win32 and Win64 applications under these restrictions Any help is much appreciated.
Visual C++8





oflebbe

Posted: Visual C++ Language, Problem: float +/ result depend on computation order, Win32/x64, compiler options 
Top 
Hi,
this behaviour of + and  is by design. In numerical mathematics this is called "cancellation". Subtracting numbers of comparable size is getting large absolute errors.
In 32 Bit Mode all floating point operations (regardless of float or double) are done in the FPU registers by default, which have an internal precision of 96 (I forgot the correct size) Bits. Rounding down to floats (32bits) or double (64) is done only when a write back to Memory is needed. This 80 Bit mode was a design by Intel, which I loved, but was very controversial in the numerical mathematics world, because it can yield to strange numerical artefacts if one trys to apply numerical tricks.
In 64 Bit Mode floating point is done in the SSE registers which have both a double precision and a single precision mode. Now every computation is done in the respective precision, yielding to predictive, but larger, numerical rounding errors.
You code will behave the same way on Linux/x86 and Linux/amd64, btw.
The workaround is
1) Never do numerical comparision with ==, rather than std::abs (ab) < error. This is the textbook solution.
2) use double for calculation.
Further Reading: What Every Computer Scientist Should Know About FloatingPoint Arithmetic http://docspdf.sun.com/8007895/8007895.pdf
Regards, Olaf Flebbe





Pieneer

Posted: Visual C++ Language, Problem: float +/ result depend on computation order, Win32/x64, compiler options 
Top 
Hi,
Thank you very much for the quick and informative reply. I was afraid the reason was due to design.
However, as I mentioned compliling with cl /clr test.cpp on x64 gives:
v1v2+v3= 0.87552536 v1+v3v2= 0.87552536
Does /clr imply that FPU is used in stead of SSE, or what is the reason for that one working as I would like it to work, i.e. giving identical results regardless of using Win32 or Win64.
Regarding the workarounds:
1) The small differences discussed here eventually result in big differences when applied many times on big datasets. The comparisons are actually done between the results of the 64bit application and 32bit application.The comparison is now much harder due to these differences.
2) Unfortunately double takes twice the memory space and approximately twice the time, therefore I have to use float.





oflebbe

Posted: Visual C++ Language, Problem: float +/ result depend on computation order, Win32/x64, compiler options 
Top 




