Bizarre .NET performance benchmarks |
|
Author |
Message |
Victor Lobanov

|
Posted: Common Language Runtime, Bizarre .NET performance benchmarks |
Top |
While attempting to achieve a maximum performance in .NET code I have stumbled on a very peculiar effect. If you compile and run the following C# code, you get a better performance in Debug configuration than in Release configuration. Moreover, if you comment out any one of the unused (!) case blocks in the Distance method, you get 5-fold !! increase in the performance in the Release configuration. I would appreciate any explanation that anyone can offer. And it would be great to hear comments from the .NET development them if they are following this forum.
Thanks,
Victor
P.S. I am using VS 2005 and Framework v 2.0.50727.42
using System;
using System.Text;
using System.IO;
using System.Diagnostics;
namespace Test
{
class Program
{
static void Main(string[] args)
{
try
{
MetricMatrix m = new MetricMatrix(5000, 20);
double dmin = double.MaxValue;
for (int i = 0, k = 0; i < m.Rows; ++i)
{
for (int j = 0; j < m.Cols; ++j)
{
m[i, j] = k++;
}
}
Stopwatch timer = Stopwatch.StartNew();
for (int i = 0; i < m.Rows; ++i)
{
for (int j = i + 1; j < m.Rows; ++j)
{
double d = m.Distance(i, j);
if (d < dmin) dmin = d;
}
}
timer.Stop();
Console.WriteLine("Matrix, {0} sec, d = {1}", timer.ElapsedMilliseconds / 1000.0, dmin);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}
}
public class MetricMatrix
{
public enum Metrics
{
Manhattan,
Euclidean,
EuclideanSquared,
Ultrametric
}
public int Rows = 0;
public int Cols = 0;
private float[] data;
private Metrics metric = Metrics.Euclidean;
public MetricMatrix(int rows, int cols)
{
data = new float[rows * cols];
this.Rows = rows;
this.Cols = cols;
}
public float this[int row, int col]
{
get { return data[row * this.Cols + col]; }
set { data[row * this.Cols + col] = value; }
}
public double Distance(int i, int j)
{
switch (metric)
{
case Metrics.Euclidean:
{
return Euclidean(i, j);
}
case Metrics.Manhattan:
{
return Manhattan(i, j);
}
case Metrics.EuclideanSquared:
{
return EuclideanSquared(i, j);
}
case Metrics.Ultrametric:
{
return Ultrametric(i, j);
}
default:
{
throw new Exception();
}
}
}
public unsafe double Manhattan(int i, int j)
{
double d = 0;
fixed (float* p = &data[0])
{
float* row1 = p + i * Cols;
float* row2 = p + j * Cols;
for (int k = 0; k < Cols; ++k)
{
double x = row1[k] - row2[k];
d += Math.Abs(x);
}
}
return d;
}
public unsafe double Euclidean(int i, int j)
{
return Math.Sqrt(EuclideanSquared(i, j));
}
public unsafe double EuclideanSquared(int i, int j)
{
double d = 0;
fixed (float* p = &data[0])
{
float* row1 = p + i * Cols;
float* row2 = p + j * Cols;
for (int k = 0; k < Cols; ++k)
{
double x = row1[k] - row2[k];
d += x * x;
}
}
return d;
}
public unsafe double Ultrametric(int i, int j)
{
double d = 0;
fixed (float* p = &data[0])
{
float* row1 = p + i * Cols;
float* row2 = p + j * Cols;
for (int k = 0; k < Cols; ++k)
{
double x = Math.Abs(row1[k] - row2[k]);
if (x > d) d = x;
}
}
return d;
}
}
}
.NET Development37
|
|
|
|
 |
Alois

|
Posted: Common Language Runtime, Bizarre .NET performance benchmarks |
Top |
Hi Victor,
I did run your test on my machine (P4) .NET 2.0 and got no such effect that the release mode is slower than debug mode:
C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Debug>MSDNPerfQuestion.exe Matrix, 3,826 sec, d = 89,4427190999916 C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Debug>MSDNPerfQuestion.exe Matrix, 3,661 sec, d = 89,4427190999916 C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Debug>MSDNPerfQuestion.exe Matrix, 3,663 sec, d = 89,4427190999916
C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Release>MSDNPerfQuestion.exe Matrix, 1,048 sec, d = 89,4427190999916 C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Release>MSDNPerfQuestion.exe Matrix, 1,047 sec, d = 89,4427190999916
Are your sure you did not turn off optimizations for the Release configuration. Or did you define the DEBUG macro inside a release build
Yours, Alois Kraus
|
|
|
|
 |
MVP User

|
Posted: Common Language Runtime, Bizarre .NET performance benchmarks |
Top |
...
While attempting to achieve a maximum performance in .NET code I have
stumbled on a very peculiar effect. If you compile and run the following C#
code, you get a better performance in Debug configuration than in Release
configuration. Moreover, if you comment out any one of the
unused (!) case blocks in the Distance method, you get 5-fold
!! increase in the performance in the Release configuration. I would
appreciate any explanation that anyone can offer. And it would be great to
hear comments from the .NET development them if they are following this forum.
Same here:
Compiled with: /o+ /debug-
Matrix, 3,369
sec, d = 89,4427190999916
Compiled with: /o- /debug-
Matrix, 1,105 sec, d =
89,4427190999916
In some rare cases (most notably the case when using "unsafe"
code), it looks like the 'optimized' IL is not handled quite *efficiently*
by the JIT optimizer.
It seems like that the JIT64 doesn't have this problem as
illustrated by the following results:
Compiled with: /o+ /debug-
Matrix, 1,097 sec, d = 89,4427190999916
Compiled with: /o- /debug-
Matrix, 1,26 sec, d = 89,4427190999916
Willy.
|
|
|
|
 |
MVP User

|
Posted: Common Language Runtime, Bizarre .NET performance benchmarks |
Top |
...
...
While attempting to achieve a maximum performance in .NET code I have
stumbled on a very peculiar effect. If you compile and run the following C#
code, you get a better performance in Debug configuration than in Release
configuration. Moreover, if you comment out any one of the
unused (!) case blocks in the Distance method, you get
5-fold !! increase in the performance in the Release configuration. I would
appreciate any explanation that anyone can offer. And it would be great to
hear comments from the .NET development them if they are following this
forum.
Same here:
Compiled with: /o+ /debug-
Matrix, 3,369
sec, d = 89,4427190999916
Compiled with: /o- /debug-
Matrix, 1,105 sec, d =
89,4427190999916
In some rare cases (most notably the case when using
"unsafe" code), it looks like the 'optimized' IL is not handled quite
*efficiently* by the JIT optimizer.
It seems like that the JIT64 doesn't have this problem as
illustrated by the following results:
Compiled with: /o+ /debug-
Matrix, 1,097 sec, d = 89,4427190999916
Compiled with: /o- /debug-
Matrix, 1,26 sec, d = 89,4427190999916
Willy.
You get more consistent (and faster code) results on
a 32bit OS (using X87 FPU), when using doubles i.s.o floats.
Here are my results for optimized an non-optimized
builds:
Matrix, 0,972 sec, d = 89,4427190999916
Matrix, 1,182 sec, d = 89,4427190999916
Willy.
|
|
|
|
 |
Victor Lobanov

|
Posted: Common Language Runtime, Bizarre .NET performance benchmarks |
Top |
Thank you everybody for checking this problem out. I had a consistent behavior on several different computers, but your results made me think that the problem must be somehow connected to our system configuration. I work for a company where all our machines standardized from the OS image point of view. I've tried on my home computer and did not obeserve this bizarre effect!. Release code was faster than Debug as it should be. While I still have no idea as to the cause of this effect, I do have now more confidence in the compiler :)
Victor
|
|
|
|
 |
MVP User

|
Posted: Common Language Runtime, Bizarre .NET performance benchmarks |
Top |
...
Thank you everybody for checking this problem out. I had a consistent
behavior on several different computers, but your results made me think that
the problem must be somehow connected to our system configuration. I work for
a company where all our machines standardized from the OS image point of view.
I've tried on my home computer and did not obeserve this bizarre
effect!. Release code was faster than Debug as it should be. While I
still have no idea as to the cause of this effect, I do have now more
confidence in the compiler :)
Victor
I don't believe so, I'm seeing the same (what you call
bizarre) behavior on different systems running the code you posted, using XP
SP2, W2K3 and VISTA running v2.0.50727 of the framework. As I said before, this
is not the first time I see such behavior when running micro benchmarks
involving floating point arithmetic and unsafe constructs, the JIT produces less
optimized code from seemingly better optimized IL.
Note also that using a mix of doubles & floats in
managed code involves a small performance hit on X86 processors as a result
of possible widening/narrowing conversions, use doubles only if you need top
performance.
Willy.
|
|
|
|
 |
Victor Lobanov

|
Posted: Common Language Runtime, Bizarre .NET performance benchmarks |
Top |
My concern here is not that ocasionally JIT can produce less optimized code, I can understand that. What I can't understand is why making a minor and irrelevant change to the code (e.g. commenting out Ultrametric case block from the Distance method, it is not used anyway) changes performance of this little example 5-fold! If you see the bizarre effect, try commenting out Ultrametric case block and see how the performance gets much better.
P.S. Somebody I know suggested that this bizarre behavior could be due to the L1 cache. Due to the diffenece in L1 cache size and logic, some CPUs have a lot of cache misses which results in a slower execution time. This would explain why small changes to the code or switching to another computers affects performance benchmarks.
|
|
|
|
 |
|
|