Validating XML against XSD with patterns in .NET 2.0  
Author Message
David A.





PostPosted: XML and the .NET Framework, Validating XML against XSD with patterns in .NET 2.0 Top

Hello,

Below I give more details, but basically the problem I have is that I have XML data that I want to validate against an XML Schema definition (XSD).
The XSD specifies an attribute which has to match a pattern defined by this regular expression:
<xsd:pattern value="((\w:) \w(=\w) ){1,}">

Valid values matching this pattern could be, for instance, something like this:
"ns1:Inwatera_1m=A, ns2:CoastL_1M=B"

However, .NET 2.0 does not seem to agree, since when I validate an XML document containing the value "ns1:Inwatera_1m=A" for an attribute that has to match this pattern I get the following validation error:

"The 'typeName' attribute is invalid - The value 'ns1:Inwatera_1m=A' is invalid according to its datatype ' http://www.hide-link.com/ :TypeNameListType' - The Pattern constraint failed."

I've tried changing the value and it seems that it only accepts single words without ":" or "=" characters.

So, my questions are:
- Why is it .NET 2.0 not matching the value "ns1:Inwatera_1m=A" to the pattern "((\w:) \w(=\w) ){1,}" Is it a bug or it's something I'm doing wrong
- Do you know how .NET 2.0 matches values to patterns when validating XML contents against XSD

Any help/suggestion will be very much appreciated.

Now the details: The C# source code I use, the XML data to test that I use and the XSD:

The code I use to validate my XML contents against the XSD file is the following one (C# on .NET 2.0):

private bool ValidateWFSQuery(string toValidateData)
{
bool isValid = true;

// toValidateData contains the XML data to validate.
// I insert it to a memory stream because XmlReader.Create requires a stream as parameter,
MemoryStream toValidateDataStream = new MemoryStream(UTF8Encoding.Default.GetBytes(toValidateData));


System.Xml.XmlReaderSettings settings = new System.Xml.XmlReaderSettings();
settings.IgnoreWhitespace = false;
// The XML Schema file I use is wfs.xsd, downloaded from http://www.hide-link.com/
string schemaFilePath = Server.MapPath("~/wfs/SCHEMAS_OPENGIS_NET/wfs/1.1.0/wfs.xsd");
settings.Schemas.Add(null, schemaFilePath);

settings.ValidationType = System.Xml.ValidationType.Schema;
settings.ValidationEventHandler += new System.Xml.Schema.ValidationEventHandler(
ValidationCallback);

// If validation errors occur, the method ValidationCallback will be called
try
{
System.Xml.XmlReader reader = System.Xml.XmlReader.Create(toValidateDataStream, settings);
while (reader.Read()) ; // Here is where the validation of the XML file takes place
}
catch (System.Xml.XmlException e)
{
string errorMsg = "Error: XML Format error in file: " + e.Message;
System.Console.WriteLine(errorMsg);
isValid = false;
}

return isValid;
}

This code is based on an example of how to validate XML files against XSD schemas that I found googling around and not using the deprecated XmlValidatingReader.

The XSD document I use (wfs.xsd) is the XML schema document for the OpenGIS Web Feature Service (WFS), v 1.1.0, which can be downloaded from http://www.hide-link.com/ .

The XML data that I am using to test is the following one:
<wfs:GetFeature service="WFS" version="1.1.0"
outputFormat="GML2"
xmlns:topp=" http://www.hide-link.com/ "
xmlns:wfs=" http://www.hide-link.com/ "
xmlns:ogc=" http://www.hide-link.com/ "
xmlns:gml=" http://www.hide-link.com/ "
xmlns:xsi=" http://www.hide-link.com/ "
xsi:schemaLocation=" http://www.hide-link.com/
http://www.hide-link.com/ ">

<!-- HERE I GET THE VALIDATION ERROR!!!! -->
<wfs:Query typeName="ns1:Inwatera_1m=A">

<ogc:Filter>
<ogc:BBOX>
<ogc:PropertyName>the_geom</ogc:PropertyName>
<gml:Envelope srsName=" http://www.hide-link.com/ #4326">
<gml:lowerCorner>-73.99312376470733 40.76203427979042</gml:lowerCorner>
<gml:upperCorner>-73.9239210030026 40.80129519821393</gml:upperCorner>
</gml:Envelope>
</ogc:BBOX>
</ogc:Filter>
</wfs:Query>
</wfs:GetFeature>

Thanks in advance for your help.

David



.NET Development14  
 
 
Martin Honnen





PostPosted: XML and the .NET Framework, Validating XML against XSD with patterns in .NET 2.0 Top

I have tried to validate your XML with Visual Studio 2005, with MSXML 5 and with Xerces Java, all complain about that string not matching the pattern. Which XML parser/editor are you using that says the XML is valid



 
 
David A.





PostPosted: XML and the .NET Framework, Validating XML against XSD with patterns in .NET 2.0 Top

Hello Martin,

First of all, thanks for taking the time to validate this XML.
I am not using any XML parser/editor that says it is valid, I just took the example from the documentation provided in the wfs.xsd schema itself (http://schemas.opengis.net/wfs/1.1.0/wfs.xsd, line 823) where they give examples for "typeName" values. I assumed these values had to be valid.

In any case, to verify it I've now used PSPad editor to check that the value "ns1:Inwatera_1m=A" matches the regular expression "((\w:) \w(=\w) ){1,}", and it does, according to PSPad.
Then I've also used XMLSpy to validate the XML against wfs.xsd and it has failed; the details of the error message give a bit more information though that .NET exception, so:

Validating this:
<wfs:Query typeName="ns1:Inwatera_1m=A"/>

I get this error message:
<error_message>
Value 'ns1:Inwatera_1m=A' is not allowed for attribute 'typeName'.
Reason: it does not satisfy any of the defined patterns (see below)
'((\w:) \w(=\w) ){1,}'

Annotations of containing element 'wfs:Query'
The Query element is used to describe a single query.
</error_message>

This clearly says that the value does not satisfy the pattern. I've then removed the "_", which looks like not being accepted as "\w" and then I get a different error:

Validating this:
<wfs:Query typeName="ns1:Inwatera1m=A"/>

File Untitled2.xml is not valid.
<error_message>
Value 'ns1:Inwatera1m=A' is not allowed for attribute 'typeName'.

Details
cvc-datatype-valid.1.2.1: For type definition 'xs:QName' the string 'ns1:Inwatera1m=A' does not match a literal in the lexical space of built-in type definition 'xs:QName'.
cvc-datatype-valid.1.2.2: For type definition 'wfs:TypeNameListType' the string 'ns1:Inwatera1m=A' should be a sequence of space-separated tokens, each of which matches a literal in the lexical space of item type definition 'xs:QName'.
cvc-simple-type.1: For type definition 'wfs:TypeNameListType' the string 'ns1:Inwatera1m=A' is not valid.
cvc-attribute.3: Value 'ns1:Inwatera1m=A' of attribute 'typeName' does not match simple type definition 'wfs:TypeNameListType'.
cvc-complex-type.3.1: The attribute 'typeName' of complex type 'wfs:QueryType' is not valid.Value 'ns1:Inwatera1m=A' is not allowed for attribute 'typeName'.
cvc-elt.5.2.1: The element <wfs:Query> is not valid with respect to the actual type definition 'wfs:QueryType'.
</error_message>

Browsing these details, it says that this value "does not match a literal in the lexical space of built-in type definition 'xs:QName'". I've checked the lexical space of xsd:QName, and it says:

"The lexical space of QName is the set of strings that match the [ QName] production of [XMLNS]" (http://www.stylusstudio.com/w3c/schema2/built-in-primitive-datatypes.htm#QName).

The QName production of XMLNS is
QName ::= (Prefix ':') LocalPart

Which I would say "ns1:Inwatera1m" matches this production; But then, I've also read that:

"NOTE:
The mapping between literals in the lexical space and values in the value space of QName requires a namespace declaration to be in scope for the context in which QName is used."

So may be is this namespace declaration mentioned in the note that I do not have -I have not defined the "ns1" namespace anywhere, I just wanted to test some data- could that be a reason

I'm a bit new to XML/XSD with .NET and I may be missing some important point. In any case, any help will be much appreciated.

Thanks!
David

 
 
Martin Honnen





PostPosted: XML and the .NET Framework, Validating XML against XSD with patterns in .NET 2.0 Top

The issue to explain why that example string does not match the regular expression pattern ((\w:) \w(=\w) ){1,} seems indeed depend on how \w is to be interpreted.

For instance with current JavaScript implementations following the ECMAScript edition 3 specification \w stands for [a-zA-Z0-9_] so the character low line "_" with the decimal Unicode 95 and the hexadecimal Unicode 5F is contained in the class denoted by \w as far as the regular expression language in JavaScript is concerned.

The regular expression language implemented by the Regex class in .NET 2.0 defines the \w class as [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}] which stands for lower case letters Ll, upper case letters Lu, title case letters Lt, other letters Lo, decimal digits Nd, connector punctuations Pc and modifier letters Lm. The character "_" belongs to the Pc class of connector punctuations so in terms of the regular expression language in the .NET 2.0 framework \w includes the character "_".

The regular expression language defined for pattern facets in the W3C XSD schema language however defines the \w class as [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters) which means the low line "_" as a punctuation character is not contained in \w.

So that explains why those schema validators used reject ns1:Inwatera_1m=A with the _ character while some other regular expression languages allow it.

I am afraid that schema you are using respectively its author overlooked that difference in the regular expression language definitions when giving examples like "ns1:Inwatera_1m=A". Furthermore the whole idea of using QName or a list of QName as a base type but then trying to allow that QName=Letter construct with a pattern can't work out in my understanding. What should work is e.g. ns1:Inwatera1m if the XML instance defines e.g. xmlns:ns1=http://example.com/whatever meaning it binds the prefix ns1 to a URL.