Fast searching of a byte array  
Author Message
drinkwater





PostPosted: Visual C# Language, Fast searching of a byte array Top

Can anyone provide guidance on what would be the fastest method of searching a large byte array for some particular sequence of bytes I know I could convert it to a string and do an IndexOf but I would like to avoid that. Any ideas

Visual C#16  
 
 
Peter Ritchie





PostPosted: Visual C# Language, Fast searching of a byte array Top

If it's un-ordered data you're pretty much stuck checking each byte--which is what IndexOf would do...

 
 
Dah_cn





PostPosted: Visual C# Language, Fast searching of a byte array Top

For big byte array (20 more items maybe), you can copy it and use quick-sort to sort it then use binary-search...

But it depends..



 
 
Peter Ritchie





PostPosted: Visual C# Language, Fast searching of a byte array Top

if you were looking for the bytes 03 02 01 in the following array: 01 01 02 02 03 03 03 02 01 01 01 02 02 02 03 03, if you sorted that array looking for (also sorted) 01 02 03 you'd now not find any matches in: 01 01 01 01 01 02 02 02 02 02 02 03 03 03 03 03

 
 
Dah_cn





PostPosted: Visual C# Language, Fast searching of a byte array Top

Oh, I made a mistake, sorting a byte array seems not reasonable...

 
 
aahkam





PostPosted: Visual C# Language, Fast searching of a byte array Top

You can opt to come out the code of KMP pattern matching algorithm. That will allows you to match almost anything you want. KMP => Knuth Morris Pratt algorithm.


 
 
SteveDrake





PostPosted: Visual C# Language, Fast searching of a byte array Top

Hello,

Peter Ritchie comments are spot on, you need to check each byte.

 

I just knocked up this :

 

using System;
using System.Collections.Generic;
using System.Text;

namespace FindByteArrayinByteArray
{
    class Program
    {
        static int indexOf(byte[] array, byte[] value)
        {
            int found = 0;
            for (int i = 0; i < array.Length; i++)
            {
                if (array[ i ] == value[found])
                {
                    if (++found == value.Length)
                    {
                        return i - found + 1;
                    }
                }
                else
                {
                    found = 0;
                }
                        }
            return -1;
        }

        public static void Main()
        {
            System.Console.WriteLine(indexOf(new byte[] { 1, 2, 3, 4 }, new byte[] { 1, 2, 3 }));
            System.Console.WriteLine(indexOf(new byte[] { 1, 2, 3, 4 }, new byte[] { 3, 2, 3 }));
            System.Console.WriteLine(indexOf(new byte[] { 1, 2, 3, 4 }, new byte[] { 2, 3, 4 }));
            System.Console.WriteLine(indexOf(new byte[] { 1, 2, 3, 4 }, new byte[] { 4 }));

            System.Console.ReadKey();
        }
    }
}

 

Thanks

Steve


 


 
 
James Curran





PostPosted: Visual C# Language, Fast searching of a byte array Top

For big byte array (20 more items maybe), you can copy it and use quick-sort to sort it then use binary-search...

Even if you were just looking for a single byte, I'd guess that the array would have to be much large (several thousand, I'd say), before sorting becomes faster than a linear search. (actually, as the search would be O(N) and the sort O(N log N), unless you can get several searches out of one sort, it would never be faster)



 
 
drinkwater





PostPosted: Visual C# Language, Fast searching of a byte array Top

Thanks SteveDrake - you have been a good help...

I was wondering if you how to handle searching a byte array within a byte array which my span several blocks. For example : I receive a buffer (byte array) which I need to search for a sequence of bytes (works great with your example) but the problem is that the I can recieve multiple blocks of byte array's and the sequence of bytes which I am searching might span one or more blocks - which makes it difficult to search without somehow concatenating the blocks and then searching - that would incur a performance overhead which I would like to avoid. Any thoughts