Bizarre .NET performance benchmarks  
Author Message
Victor Lobanov





PostPosted: Common Language Runtime, Bizarre .NET performance benchmarks Top

While attempting to achieve a maximum performance in .NET code I have stumbled on a very peculiar effect. If you compile and run the following C# code, you get a better performance in Debug configuration than in Release configuration. Moreover, if you comment out any one of the unused (!) case blocks in the Distance method, you get 5-fold !! increase in the performance in the Release configuration. I would appreciate any explanation that anyone can offer. And it would be great to hear comments from the .NET development them if they are following this forum.

Thanks,

Victor

P.S. I am using VS 2005 and Framework v 2.0.50727.42

using System;

using System.Text;

using System.IO;

using System.Diagnostics;

namespace Test

{

class Program

{

static void Main(string[] args)

{

try

{

MetricMatrix m = new MetricMatrix(5000, 20);

double dmin = double.MaxValue;

for (int i = 0, k = 0; i < m.Rows; ++i)

{

for (int j = 0; j < m.Cols; ++j)

{

m[i, j] = k++;

}

}

Stopwatch timer = Stopwatch.StartNew();

for (int i = 0; i < m.Rows; ++i)

{

for (int j = i + 1; j < m.Rows; ++j)

{

double d = m.Distance(i, j);

if (d < dmin) dmin = d;

}

}

timer.Stop();

Console.WriteLine("Matrix, {0} sec, d = {1}", timer.ElapsedMilliseconds / 1000.0, dmin);

}

catch (Exception e)

{

Console.WriteLine(e.ToString());

}

}

}

public class MetricMatrix

{

public enum Metrics

{

Manhattan,

Euclidean,

EuclideanSquared,

Ultrametric

}

public int Rows = 0;

public int Cols = 0;

private float[] data;

private Metrics metric = Metrics.Euclidean;

public MetricMatrix(int rows, int cols)

{

data = new float[rows * cols];

this.Rows = rows;

this.Cols = cols;

}

public float this[int row, int col]

{

get { return data[row * this.Cols + col]; }

set { data[row * this.Cols + col] = value; }

}

public double Distance(int i, int j)

{

switch (metric)

{

case Metrics.Euclidean:

{

return Euclidean(i, j);

}

case Metrics.Manhattan:

{

return Manhattan(i, j);

}

case Metrics.EuclideanSquared:

{

return EuclideanSquared(i, j);

}

case Metrics.Ultrametric:

{

return Ultrametric(i, j);

}

default:

{

throw new Exception();

}

}

}

public unsafe double Manhattan(int i, int j)

{

double d = 0;

fixed (float* p = &data[0])

{

float* row1 = p + i * Cols;

float* row2 = p + j * Cols;

for (int k = 0; k < Cols; ++k)

{

double x = row1[k] - row2[k];

d += Math.Abs(x);

}

}

return d;

}

public unsafe double Euclidean(int i, int j)

{

return Math.Sqrt(EuclideanSquared(i, j));

}

public unsafe double EuclideanSquared(int i, int j)

{

double d = 0;

fixed (float* p = &data[0])

{

float* row1 = p + i * Cols;

float* row2 = p + j * Cols;

for (int k = 0; k < Cols; ++k)

{

double x = row1[k] - row2[k];

d += x * x;

}

}

return d;

}

public unsafe double Ultrametric(int i, int j)

{

double d = 0;

fixed (float* p = &data[0])

{

float* row1 = p + i * Cols;

float* row2 = p + j * Cols;

for (int k = 0; k < Cols; ++k)

{

double x = Math.Abs(row1[k] - row2[k]);

if (x > d) d = x;

}

}

return d;

}

}

}



.NET Development37  
 
 
Alois





PostPosted: Common Language Runtime, Bizarre .NET performance benchmarks Top

Hi Victor,

I did run your test on my machine (P4) .NET 2.0 and got no such effect that the release mode is slower than debug mode:

C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Debug>MSDNPerfQuestion.exe
Matrix, 3,826 sec, d = 89,4427190999916
C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Debug>MSDNPerfQuestion.exe
Matrix, 3,661 sec, d = 89,4427190999916
C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Debug>MSDNPerfQuestion.exe
Matrix, 3,663 sec, d = 89,4427190999916

C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Release>MSDNPerfQuestion.exe
Matrix, 1,048 sec, d = 89,4427190999916
C:\Source\MSDNPerfQuestion\MSDNPerfQuestion\bin\Release>MSDNPerfQuestion.exe
Matrix, 1,047 sec, d = 89,4427190999916

Are your sure you did not turn off optimizations for the Release configuration. Or did you define the DEBUG macro inside a release build

Yours,
Alois Kraus



 
 
MVP User





PostPosted: Common Language Runtime, Bizarre .NET performance benchmarks Top

...

While attempting to achieve a maximum performance in .NET code I have stumbled on a very peculiar effect. If you compile and run the following C# code, you get a better performance in Debug configuration than in Release configuration. Moreover, if you comment out any one of the unused (!) case blocks in the Distance method, you get 5-fold !! increase in the performance in the Release configuration. I would appreciate any explanation that anyone can offer. And it would be great to hear comments from the .NET development them if they are following this forum.

 

Same here:
Compiled with: /o+ /debug-
Matrix, 3,369 sec, d = 89,4427190999916
Compiled with: /o- /debug-
Matrix, 1,105 sec, d = 89,4427190999916
In some rare cases (most notably the case when using "unsafe" code), it looks like the 'optimized' IL is not handled quite *efficiently* by the JIT optimizer.
It seems like that the JIT64 doesn't have this problem as illustrated by the following results:
Compiled with: /o+ /debug-
Matrix, 1,097 sec, d = 89,4427190999916
Compiled with: /o- /debug-
Matrix, 1,26 sec, d = 89,4427190999916
 
 
Willy.
 
 
 

 
 
MVP User





PostPosted: Common Language Runtime, Bizarre .NET performance benchmarks Top

...
...

While attempting to achieve a maximum performance in .NET code I have stumbled on a very peculiar effect. If you compile and run the following C# code, you get a better performance in Debug configuration than in Release configuration. Moreover, if you comment out any one of the unused (!) case blocks in the Distance method, you get 5-fold !! increase in the performance in the Release configuration. I would appreciate any explanation that anyone can offer. And it would be great to hear comments from the .NET development them if they are following this forum.

 

Same here:
Compiled with: /o+ /debug-
Matrix, 3,369 sec, d = 89,4427190999916
Compiled with: /o- /debug-
Matrix, 1,105 sec, d = 89,4427190999916
In some rare cases (most notably the case when using "unsafe" code), it looks like the 'optimized' IL is not handled quite *efficiently* by the JIT optimizer.
It seems like that the JIT64 doesn't have this problem as illustrated by the following results:
Compiled with: /o+ /debug-
Matrix, 1,097 sec, d = 89,4427190999916
Compiled with: /o- /debug-
Matrix, 1,26 sec, d = 89,4427190999916
 
 
Willy.
 

 
 

You get more consistent (and faster code) results on a 32bit OS (using X87 FPU), when using doubles i.s.o floats.
Here are my results for optimized an non-optimized builds:
Matrix, 0,972 sec, d = 89,4427190999916
Matrix, 1,182 sec, d = 89,4427190999916
 
 
Willy.
 
 

 
 
Victor Lobanov





PostPosted: Common Language Runtime, Bizarre .NET performance benchmarks Top

Thank you everybody for checking this problem out. I had a consistent behavior on several different computers, but your results made me think that the problem must be somehow connected to our system configuration. I work for a company where all our machines standardized from the OS image point of view. I've tried on my home computer and did not obeserve this bizarre effect!. Release code was faster than Debug as it should be. While I still have no idea as to the cause of this effect, I do have now more confidence in the compiler :)

Victor


 
 
MVP User





PostPosted: Common Language Runtime, Bizarre .NET performance benchmarks Top

...

Thank you everybody for checking this problem out. I had a consistent behavior on several different computers, but your results made me think that the problem must be somehow connected to our system configuration. I work for a company where all our machines standardized from the OS image point of view. I've tried on my home computer and did not obeserve this bizarre effect!. Release code was faster than Debug as it should be. While I still have no idea as to the cause of this effect, I do have now more confidence in the compiler :)

Victor


I don't believe so, I'm seeing the same (what you call bizarre) behavior on different systems running the code you posted, using XP SP2, W2K3 and VISTA running v2.0.50727 of the framework. As I said before, this is not the first time I see such behavior when running micro benchmarks involving floating point arithmetic and unsafe constructs, the JIT produces less optimized code from seemingly better optimized IL.
Note also that using a mix of doubles & floats in managed code involves a small performance hit on X86 processors as a result of possible widening/narrowing conversions, use doubles only if you need top performance.
 
Willy.
 
 
 

 
 
Victor Lobanov





PostPosted: Common Language Runtime, Bizarre .NET performance benchmarks Top

My concern here is not that ocasionally JIT can produce less optimized code, I can understand that. What I can't understand is why making a minor and irrelevant change to the code (e.g. commenting out Ultrametric case block from the Distance method, it is not used anyway) changes performance of this little example 5-fold! If you see the bizarre effect, try commenting out Ultrametric case block and see how the performance gets much better.

P.S. Somebody I know suggested that this bizarre behavior could be due to the L1 cache. Due to the diffenece in L1 cache size and logic, some CPUs have a lot of cache misses which results in a slower execution time. This would explain why small changes to the code or switching to another computers affects performance benchmarks.