Walking the call stack as fast as possible  
Author Message
Iulian Radu





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Hello !

I have dedicated some time to developing a memory allocator that replaces the default crt allocator. Things are almost finished, but recently i've added a leak trace that saves the call stack for every memory allocation. (The stack is saved in the process memory, so there's no interprocess communication overhead.) That adds a very important overhead to the allocator and i was wondering if there is any efficient way of getting the call stack. At the moment i'm using StackWalk64 and the profiler i'm using (CodeAnalist) reports that 99+% of the time is spent in dbghelp.dll, ntoskrnl.dll and msvcrxx.dll (strangely, the functions used from here are wcslen and its family even though i don't use them in my code). Even more strange is the fact that only 5-6 % of the time spent in dbghelp.dll is used by StackWalk64 and the bulk of it is used by MiniDumpWriteDump.
That being said, is there any faster way of getting the call stack I don't expect the code to be
as fast as without the trace, but i think a 20-50x slowdown would be better than this. Now the speed of the allocator has decreased by 3 orders of magnitude. (By the way, i'm not getting a full stack, just the first 10 frames)



Visual C++15  
 
 
Ayman Shoukry





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Have you tried using the DIA tools to walk the stack (http://msdn2.microsoft.com/en-us/library/108e9y6d.aspx)

Thanks,
Ayman Shoukry
VC++ Team


 
 
Iulian Radu





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Have you tried using the DIA tools to walk the stack (http://msdn2.microsoft.com/en-us/library/108e9y6d.aspx)

Thanks,
Ayman Shoukry
VC++ Team


I have just found about the DIA SDK a couple of hours ago. Is it faster

 
 
Ayman Shoukry





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

To be honest, I don't have much experience myself with DIA (it is probably worth a try) but I will forward the issue to one of the folks on our team who should know more than myself :-)

Thanks,
Ayman Shoukry
VC++ Team


 
 
Milis





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

It depends actually. If the function frames are nicely stored in the stack and easy to find then yes it is pretty fast. But we all know that is not the case espeacially when the application is an optimized one. Still it should give you better performance then using StackWalk64.

Unfortunately there's no good example yet in the product on how to do stack walking using DIA SDK. I was hopping to get one in time for shipping but I was to late. The documentation should be enough though given that you know already a lot of stuff from working with dbghelp api's wich is very similar to how DIA SDK stack walker works.

Thanks,

Milis

VC++ team



 
 
Iulian Radu





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Thank you very much. I'll give it a try.

 
 
KasperWessing





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Hi Iulian,

Did you managed to create a working stack walk with DIA, We like to do the same thing as you, but I find it a bit unclear how I have to create the implementation of IDiaStackWalkHelper and how I finally create the IDiaStackWalkHelper object, so If you or anyone else (i.e. VC++ team at Microsoft) already has done this I like to use it as sample code.

Thx in advance,
Kasper


 
 
Iulian Radu





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Hi Iulian,

Did you managed to create a working stack walk with DIA, We like to do the same thing as you, but I find it a bit unclear how I have to create the implementation of IDiaStackWalkHelper and how I finally create the IDiaStackWalkHelper object, so If you or anyone else (i.e. VC++ team at Microsoft) already has done this I like to use it as sample code.

Thx in advance,
Kasper



Hello !

I haven't tried using it since i didn't expect a magical increase in speed. I've fiddled with /GH /Gh switches and added _penter and _pexit functions that keep track of the stack. Actually, i've only been using /Gh since VC++ 2005 had a bug in placing _pexit properly in all cases. They fixed that in the patch and i recommend you to use the /GH switch too since the trick that I've used to emulate the call to my _pexit handler made the stack untraversable during debugging.

The _penter/_pexit solution I've implemented only slows the program by 5-10% i think(versus 1-2000% with the WinDBG). Of course, different calling behaviors have different performance penalties, but it's much better than the all purpose solution provided by DIA or WinDBG. The main limitation is that you can't use a plugin system since it requires recompilation and you can only trace your executables.

I hope you can use _penter /_pexit to solve your problem.

Sincerely,
Iulian Radu

 
 
KasperWessing





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Hi Iulian,

It works like charme,

Thx, Kasper


 
 
dosler





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Hi there, Iulian!

you could benefit from the undocumented (at least not for winxp) ntdll!RtlCaptureStackBackTrace

there's some info on the windows 2003 server implementation in msdn but i found it equally applicable to winxp. It is a low-overhead stack capture routine which works with restrictions : it does not walk FPO-optimized frames (by design) and it does not use symbols (also by design - that's where you get the speed benefit). Once the stack traces have been taken they can be instrumented with symbols at any convenient point later on (using dbghelp or DIA). This is how the debugging page heap functionality is implemented in the system itself (these snapshots must be fast!)

Hope it is still helpfull!

dmitri.


 
 
YCY





PostPosted: Visual C++ General, Walking the call stack as fast as possible Top

Does anyone ever use the DIA to walk stack with a module created with FPO (frame pointer omission) Does it work