regex help  
Author Message
sakarmo





PostPosted: Regular Expressions, regex help Top

 I am trying to this regex but I am not. is it possible

help me!!!!

string: x*[2+[(2+y)*z]*(4+y)]-3/[(3+y)*z]
regex:
matches:
2+[(2+y)*z]*(4+y)
(3+y)*z

(Moderator: Thread moved to the Regular Expression Forum and Title tweaked for quicker thread understanding during a search)


.NET Development33  
 
 
OmegaMan





PostPosted: Regular Expressions, regex help Top

This will get you closer. The idea is that there is a look ahead and look behind constructs in regex. What they do is just bookend for what you really wanted, or anchor points so to speak.

So we specify the look ahead to be Attendees and the look behind to be two blank lines. The look ahead/behind will not be captured so what you will end up with is the attendees. You will need to the sort out the names as to what you need from the match.

Notes, the attributes are important so keep them. Also below is a tool I use and am not associated with it. It is free to use by the author.


// using System.Text.RegularExpressions;

/// <summary>
/// Regular expression built for C# on: Sat, Jan 13, 2007, 02:36:31 PM
/// Using Expresso Version: 3.0.2559, http://www.ultrapico.com
///
/// Find the group of attendees names for later processing.
///
/// A description of the regular expression:
///
/// Match a prefix but exclude it from the capture. [Attendees:\r\n]
/// Attendees:\r\n
/// Attendees:
/// Carriage return
/// New line
/// [1]: A numbered capture group. [.*]
/// Any character, any number of repetitions
/// Match a suffix but exclude it from the capture. [( :\r\n){2,}]
/// Match expression but don't capture it. [\r\n], at least 2 repetitions
/// \r\n
/// Carriage return
/// New line
///
///
/// </summary>
public Regex Attendees = new Regex(
( <=Attendees:\r\n)(.*)( =( :\r\n){2,})",
RegexOptions.Singleline
| RegexOptions.ExplicitCapture
| RegexOptions.CultureInvariant
| RegexOptions.Compiled
);




 
 
Frank Boyne





PostPosted: Regular Expressions, regex help Top

It appears you are trying to find the string bounded by the outermost pair of square brackets and allowing nested square brackets within the string

Try the expression described in the following blog entry: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx. The example uses angle brackets as the thing being matched, but you should be able to convert the expression to look for square brackets fairly easily.

Remember that you need to use \[ and \] to represent the actual '[' and ']' characters and be careful to distinguish between the angle brackets that are part of the expression being matched and the angle brackets that are part of the RegEx expression ( <name>subexpression).


 
 
MuscleHead





PostPosted: Regular Expressions, regex help Top

Thanks Omegaman.

With the singleline option turned on, the regex in your post captures everything after Attendees: as one big match, as opposed to multiple matches (unless I screwed it up)...

I did try to tweak it: ( <=Attendees:\r\n)(( <=( :Mr.|Mrs.))\s( <lastname>\w+)\s( <firstname>\w+)( =\r\n))

I was hoping that forcing each match to start with Mr. or Mrs. and prevent each match from having some carriage returns in the middle would solve the problem, but the above didn't match at all. If I just take out the part that enforces Attendees:

(( <=( :Mr.|Mrs.))\s( <lastname>\w+)\s( <firstname>\w+)( =\r\n))

Then it does match, but it would include a match before Attendees:, which is no good... I just can't seem to get the first anchor point to work

Thanks!



 
 
OmegaMan





PostPosted: Regular Expressions, regex help Top

What you have found is exactly right. The regex provided is only to acquire the set of names but not return the names themselves. I am remiss for not mentioning that....my bad. <g> The thought was once that was done another regex would extract the names.

As you have found by trying to have the regex do all the work at once, it clouds the issue by matching outside attendees block....hence is why I am now suggesting a two part process be done.

Here is an example, note I changed the original regex to place the capture into a group At****deesRaw as denoted by regex grammar of "< <GroupNameHere> ... )":


string

Attendees:
Mr. Jones Alfred
Mrs. Smith John
Mr. Black Peter
Mr. White Arnold

blah blah blah";

string attendeesPattern


Match m = Regex.Match(data, attendeesPattern, RegexOptions.Singleline
| RegexOptions.ExplicitCapture
| RegexOptions.CultureInvariant
| RegexOptions.Compiled);

if (m != null)
{
List<string> names = new List<string>();

string
Regex LastNames = new Regex(pattern,
RegexOptions.Compiled |
RegexOptions.CultureInvariant |
RegexOptions.ExplicitCapture);

MatchCollection mts = LastNames.Matches(m.Groups["AttendeesRaw"].Value);

foreach (Match m2 in mts)
names.Add(m2.Groups["Last"].Value);

foreach (string name in names)
Console.WriteLine(name);
}



Note the above code will fail if the attendees section doesn't have \n\n or falls at the end of the document.

Console OutputAlfred
John
Peter
Arnold


 
 
MuscleHead





PostPosted: Regular Expressions, regex help Top

Thanks. Doing it in two parts could work.

Is it really true though that a single regex couldn't do it



 
 
OmegaMan





PostPosted: Regular Expressions, regex help Top

Thanks. Doing it in two parts could work.

Is it really true though that a single regex couldn't do it



I couldn't find a way...but that doesn't mean a holy grail regex is not out there. <g>


 
 
MuscleHead





PostPosted: Regular Expressions, regex help Top

Thanks Omega Man.

Does anyone know if that 'Holy Grail' Regex exists :)



 
 
Sergei Z





PostPosted: Regular Expressions, regex help Top

u can try this

( <!Attendees)\s* ( :mrs .* \w+\s+( <LAST_NAME>\w+)( :\s+|\Z))+

SingleLine On, IgnoreCase ON

vs input

Attendees:
Mr. Jones Alfred
Mr. John Calder

last names will be captured in one step in group LAST_NAME: [metadata from regexBuddy]

Match offset: 10
Match length: 35
Group "LAST_NAME": Calder
Group "LAST_NAME" offset: 39
Group "LAST_NAME" length: 6


 
 
Sergei Z





PostPosted: Regular Expressions, regex help Top

only now realized that LAST_Name comes before first name in the input:

Attendees:
Mr. Jones Alfred
Mrs. Smith John
Mr. Black Peter
Mr. White Arnold

the regex to capture all LAST_NAME needs to be changed to

( <=Attendees:)\s* ( :mrs .* ( <LAST_NAME>\w+)\s+\w+( :\s+|\Z))+

SingleLine On, IgnoreCase ON

Match offset: 10
Match length: 74
Group "LAST_NAME": White
Group "LAST_NAME" offset: 68
Group "LAST_NAME" length: 5


 
 
OmegaMan





PostPosted: Regular Expressions, regex help Top

Hi Sergie Z,

Welcome to the MSDN forums and thanks for dropping by.

I was a little confused by your first post with the use of <!. But the second post clarified it to use the positive look behind <= which made more sense and worked! Thanks.


 
 
MuscleHead





PostPosted: Regular Expressions, regex help Top

Thanks for the help Sergei. But did it really work for you I am using regex buddy, and have both "dot matches newline" and "case insensitive" checked (but ^$ match at line breaks unchecked), and all that gets selected is the last name in the list ("White")...

 
 
Sergei Z





PostPosted: Regular Expressions, regex help Top

no, it works OK. Group LAST_NAME actually captures all 4 last_names. The problem is that regexBuddy deceives u, showing u only the last one. To get all last_names , u'll have to access CaptureCollection of the group LAST_Name. The code below gives u an example how to access members of CaptureCollection of a capturing Group. There will be 4 captures in LAST_NAME group:

Jones

Smith

Black

White

u can use Expresso to see them all: it shows all captures of a group.

<C# code>

//how group/capture collection works:
using System;
using System.Text.RegularExpressions;

public class RegexTest
{
public static void RunTest()
{
int counter;
Match m;
CaptureCollection cc;
GroupCollection gc;
// declare Regex , Regex pattern

// Define the string to analyze, find matches

//get a group collection
gc = m.Groups;
// Print the number of groups.
Console.WriteLine("Captured groups = " + gc.Count.ToString());
// Loop through each group.
for (int i=0; i < gc.Count; i++)
{ //get a collection of captures
cc = gc [ i ].Captures;
counter = cc.Count;
// Print number of captures in this group.
Console.WriteLine("Captures count = " + counter.ToString());
// Loop through each capture in group. Print all the captures
for (int ii = 0; ii < counter; ii++)
{
// Print capture
Console.WriteLine(cc[ii]);
}
}
}

public static void Main()
{
RunTest();
Console.WriteLine("\nPress Enter to Exit");
Console.ReadLine();
}

}



 
 
Sergei Z





PostPosted: Regular Expressions, regex help Top

Omegaman,

thanx for kind words. It's good to be here. Hopefully I can interact with more .NET people here, got tired of PHP/Perl folks in regexadvice. :=)

see u around


 
 
OmegaMan





PostPosted: Regular Expressions, regex help Top

It is subtle, but yes they are in the capture collection of one match. I confirm what Sergei states.