Board index » Visual Studio » XML to plain text

XML to plain text

Visual Studio180
I have a simple xml file that contains, in part, content that is in HTML. I

am encompassing that content in <![cdata[]]>tags. This works fine.



However, my application needs to output the XML file (from a strongly typed

dataset) to plain text. I am doing



theText = myDataset.getXML()



Which works, but it doesn't "remember" the portions that were in cdata tags,

so that content gets parsed, turning every html tag int &lt;b&gt;, etc...



Is there a simple way to output that data without parsing it, and forcing

certain nodes to use the cdata tag? The getXML function accepts no

parameters.



Thanks!



MCD


-
 

Re:XML to plain text

Since I believe this issomething which can't get fixed in five seconds try:

theText =Replace(theText, "<", "&lt;")

theText =Replace(theText, ">", "&gt;")



"Big D" <a@a.com>wrote in message

Quote
I have a simple xml file that contains, in part, content that is in HTML.

I

am encompassing that content in <![cdata[]]>tags. This works fine.



However, my application needs to output the XML file (from a strongly

typed

dataset) to plain text. I am doing



theText = myDataset.getXML()



Which works, but it doesn't "remember" the portions that were in cdata

tags,

so that content gets parsed, turning every html tag int &lt;b&gt;, etc...



Is there a simple way to output that data without parsing it, and forcing

certain nodes to use the cdata tag? The getXML function accepts no

parameters.



Thanks!



MCD









-

Re:XML to plain text

Hi BigD,



Did you mean this?



\\\It start with making a sample dataset

Dim ds As New DataSet

Dim dt As New DataTable("parameters")

For c As Integer = 1 To 10

Dim dc As New DataColumn("elem" & c.tostring)

dt.Columns.Add(dc)

Next

For r As Integer = 1 To 10

Dim dr As DataRow = dt.NewRow

For c As Integer = 1 To 10

dr("elem" & c.tostring) = _

r.ToString & c.tostring ' or just dr(c) but to show you

Next

dt.Rows.Add(dr) ' can also before but I find this looking nicer

Next

ds.Tables.Add(dt)

-- end building sample dataset

Dim ser As XmlSerializer = New XmlSerializer(GetType(DataSet))

Dim ms As New IO.MemoryStream

Dim sw As IO.TextWriter = New IO.StreamWriter(ms)

ser.Serialize(sw, ds)

Dim b As Long = ms.Length

ms.Position = 0

Dim sr As IO.TextReader = New IO.StreamReader(ms)

Dim xmlstring As String = sr.ReadToEnd

sw.Close()

sr.Close()

ms.Close()

///

I hope this helps a little bit?



Cor





-

Re:XML to plain text



Hi Big D,



I have reviewed your issue. I will spend some time to do some research on

this issue.



I will reply to you ASAP. Thanks for your understanding.



Best regards,

Jeffrey Tan

Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security

This posting is provided "as is" with no warranties and confers no rights.



-

Re:XML to plain text

Hi Jeffrey.



Quote
I have reviewed your issue. I will spend some time to do some research on

this issue.



I will reply to you ASAP. Thanks for your understanding.





While there are 2 answers that are unanswered if it fits.



Is there a difference with the actions from MOPS between the persons who are

asking questions to this newsgroup?



If this is not an accident I think this is embarrassing.



Cor





-

Re:XML to plain text

Richard,



Thanks for the reply.



Yes, obviously I could do that. However, for one, I'm not confident that

"<,>" are the only characters getting parsed out. Secondly, The problem is

that even if I could do a find and replace on all the parsed characters,

this information would not exist within the <![cdata[]]>tag, so it won't be

a valid XML document. (the GetXML() funciton just places the parsed data in

between the tags without "knowing" that previously it was in cdata)



Is this a part of the schema that I need to adjust to notify it to expect

these characters? If so, how?



Thanks for the input!



-MCD



"Richard T. Edwards@pwpsquared.net" <redwar@pwpsquared.net>wrote in message

Quote
Since I believe this issomething which can't get fixed in five seconds

try:

theText =Replace(theText, "<", "&lt;")

theText =Replace(theText, ">", "&gt;")



"Big D" <a@a.com>wrote in message

news:%23xn7TSz%23DHA.2212@TK2MSFTNGP10.phx.gbl...

>I have a simple xml file that contains, in part, content that is in

HTML.

I

>am encompassing that content in <![cdata[]]>tags. This works fine.

>

>However, my application needs to output the XML file (from a strongly

typed

>dataset) to plain text. I am doing

>

>theText = myDataset.getXML()

>

>Which works, but it doesn't "remember" the portions that were in cdata

tags,

>so that content gets parsed, turning every html tag int &lt;b&gt;,

etc...

>

>Is there a simple way to output that data without parsing it, and

forcing

>certain nodes to use the cdata tag? The getXML function accepts no

>parameters.

>

>Thanks!

>

>MCD

>

>









-

Re:XML to plain text

Hey Cor,



Thanks for the reply. I haven't tried the code bit, but it doesn't seem

like what I need. First off, It appears that you are programmatically

building the dataset, not from the schema... that is a neccescity for my

design. The cool part of how I have it working is that since it's a

strongly typed dataset, it's super easy to work with, I don't have to know

everything about the schema in order to operate on parts of it, and the

GetXML() function is EXACTLY what I want to do, EXCEPT of course that it is

parsing the "<" characters and such.



To me is seems like a schema issue. Previously I have just manually entered

the CDATA tag into fields where I knew that there would be HTML. It seems

like VS should be able to know from a setting in the xsd that the element

contains illegal characters. That way, when GetXML reads the schema to

output the data in the dataset, it would know what to do.



Maybe I'm dreaming.



;-)



Thanks!



MCD

"Cor" <non@non.com>wrote in message

Quote
Hi BigD,



Did you mean this?



\\\It start with making a sample dataset

Dim ds As New DataSet

Dim dt As New DataTable("parameters")

For c As Integer = 1 To 10

Dim dc As New DataColumn("elem" & c.tostring)

dt.Columns.Add(dc)

Next

For r As Integer = 1 To 10

Dim dr As DataRow = dt.NewRow

For c As Integer = 1 To 10

dr("elem" & c.tostring) = _

r.ToString & c.tostring ' or just dr(c) but to show you

Next

dt.Rows.Add(dr) ' can also before but I find this looking

nicer

Next

ds.Tables.Add(dt)

-- end building sample dataset

Dim ser As XmlSerializer = New XmlSerializer(GetType(DataSet))

Dim ms As New IO.MemoryStream

Dim sw As IO.TextWriter = New IO.StreamWriter(ms)

ser.Serialize(sw, ds)

Dim b As Long = ms.Length

ms.Position = 0

Dim sr As IO.TextReader = New IO.StreamReader(ms)

Dim xmlstring As String = sr.ReadToEnd

sw.Close()

sr.Close()

ms.Close()

///

I hope this helps a little bit?



Cor









-

Re:XML to plain text



Hi Big D,



Sorry for letting you wait for so long time.



After consult to the product team, I know the cause of the problem.



Actually, this behavior is by design.



This is the way XML is supposed to be serialized to a string. The "<"

character is not allowed to occur in text or attribute content because it

marks the beginning of a markup, therefore we escape it as &lt;. We also

escape ">" for compatibility reasons. If you look at the dataset content

though, you should see "<" and ">" in the value unescaped.



XML spec section 2.4:



The ampersand character (&) and the left angle bracket (<) may appear in

their literal form only when used as markup delimiters, or within a

comment, a processing instruction, or a CDATA section. If they are needed

elsewhere, they must be escaped using either numeric character references

or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>)

may be represented using the string "&gt;", and must, for compatibility, be

escaped using "&gt;"



So as a workaround, you may follow Richard's suggestion to parse the string

yourself.



Best regards,

Jeffrey Tan

Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security

This posting is provided "as is" with no warranties and confers no rights.



-

Re:XML to plain text

Hi Jeffrey,



Now I get curious, can you tell me why I have that design behaviour not with

the sample I have send.



Cor





Quote


Hi Big D,



Sorry for letting you wait for so long time.



After consult to the product team, I know the cause of the problem.



Actually, this behavior is by design.



This is the way XML is supposed to be serialized to a string. The "<"

character is not allowed to occur in text or attribute content because it

marks the beginning of a markup, therefore we escape it as &lt;. We also

escape ">" for compatibility reasons. If you look at the dataset content

though, you should see "<" and ">" in the value unescaped.



XML spec section 2.4:



The ampersand character (&) and the left angle bracket (<) may appear in

their literal form only when used as markup delimiters, or within a

comment, a processing instruction, or a CDATA section. If they are needed

elsewhere, they must be escaped using either numeric character references

or the strings "&amp;" and "&lt;" respectively. The right angle bracket

(>)

may be represented using the string "&gt;", and must, for compatibility,

be

escaped using "&gt;"



So as a workaround, you may follow Richard's suggestion to parse the

string

yourself.



Best regards,

Jeffrey Tan

Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security

This posting is provided "as is" with no warranties and confers no rights.







-

Re:XML to plain text

Hi,



I saw overlooking my sample, that I forgot to tell that it needs an import

to

System.Xml.Serialization

Or that you have to set it before the xmlseralizer.



Cor





-

Re:XML to plain text



Hi Cor,



Oh, sorry, I can not see any succees in your solution.



In your solution, you build a dataset yourself, which contains no CDATA

section, also, your self-produced dataset contains no "special"

character(such as "<" or ">").



I have tested your solution in the correct way in C#, but it also does not

work, like this:

private void button1_Click(object sender, System.EventArgs e)

{

DataSet ds=new DataSet();

ds.ReadXml(@"D:\newtest.xml");



XmlSerializer ser=new XmlSerializer(typeof(DataSet));

MemoryStream ms=new MemoryStream();

TextWriter sw=new StreamWriter(ms);

ser.Serialize(sw, ds);



long b=ms.Length;

ms.Position=0;



TextReader sr=new StreamReader(ms);

string xmlstring =sr.ReadToEnd();

sw.Close();

sr.Close();

ms.Close();

}



Then, in debugger you will see that the CDATA section in my

"D:\newtest.xml" is also parsed(That is "<" becomes "&lt")





Best regards,

Jeffrey Tan

Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security

This posting is provided "as is" with no warranties and confers no rights.



-

Re:XML to plain text

Hi Jeffrey,



Thank you for your message. It made my confusion totally clear.



I was going for the dataset alone to string, while the problem is a HTML

portion text saved as a string in a dataset.



Thinking it over than the answer for Big D is of course very simple.



To get the portions in the dataset, read it with dataset.readXML(path) and

then just write the items as needed with the streamreader to disk or just

use it.



(The answer can be "by design", but I think that the addition must than be

that it when it is written in this way by the ds.writexml it is readed in

the properiate size back with ds.readxml).



Just my thoughts



Cor





-