High performance parsing of string messages  
Author Message
bpeikes





PostPosted: Visual C# General, High performance parsing of string messages Top

In C or C++ I could parse parts of a string without having to copy the parts out of the string buffer. Let's say I have a string buffer where I know that the one number is the first 4 characters and another number is the last three. I could parse it like this: (note that StringParse and IntegerParse functions are not shown to keep the example short)

char *c = "abcd123";
char *numberPart = c + 4;
int a = IntegerParse(c, 4);
int b = IntegerParse(numberPart, 3);

In c# it would look like this:
string c = "abcd123";
int a = int.Parse(c.Subsctring(0, 4));
int b = int.Parse(c.Substring(4, 3));

The problem in the C# code is that c.SubString is going to make a copy of a part of the string. Now this is not that big a deal except for when the original string is a large buffer read off a socket and you are parsing hundreds to thousands of messages per second. In this case, allocate memory for the substrings each time you parse a message.


Visual C#12  
 
 
James Curran





PostPosted: Visual C# General, High performance parsing of string messages Top

Well, two things to note: First, IntegerParse isn't part of the Standard C library; it's something you had to write yourself (although most of the work would be don't by standard functions), and Second, if you were to write IntegerParse(c + 10, 100) bad things would happen --- exactly the type of bad thing the C#/.net/managed code is intended to prevent.

Now, if you wanted to write your own IntegerParse in C#, it would look something like this:


int IntegerParse(string str, int start, int len)
{
if (str.Length < start + len)
throw ArgumentExpection(....);

int total = 0;
int base = 10; // this might want to be 16 or a parameter.
for(int i = start; i < start + len; ++i)
{
total = total * 10 + (int) Char.GetNumericValue(str, i);
}

}



 
 
OmegaMan





PostPosted: Visual C# General, High performance parsing of string messages Top

What is your final goal

Why manually parse the string when you could use a regex language which is designed to be a high performance parser in any language Let the parser worry about the memory contraints, there are a whole cadre of MS developers worried about that product I am sure. <g>

For example one could create a parser to extract out your example items and place them in groups:


Regex parse = new

MatchCollection mc = parse.Matches("Omega1256");

if (mc.Count > 0)
Console.WriteLine("Text ({0}) Value ({1})", mc[0].Groups["Text"].Value, mc[0].Groups["Value"].Value);



Console OutputText (Omega) Value (1256)