Board index » Visual Studio » Parsing HTML

Parsing HTML

Visual Studio369
Hi,



I have a requirement where in I need to parse and get all the HTML and

custom specific tags from the HTML page.



I'm using mshtml and vb.net. One way i thought off was to use the regex

functions. Can I know what is the best way to parse and fetch all the tags or

if any one has worked on already can help me out so that I don't need to

reinvent the wheel.



Thanks in Advance.





Rgds,

--

Sindbaad


-
 

Re:Parsing HTML

Sindbaad,

There is an SgmlReader on www.gotdotnet.com that you might be able to use:



http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid" rel="nofollow" target="_blank">www.gotdotnet.com/Community/UserSamples/Details.aspx=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC



Hope this helps

Jay



"Sindbaad" <sindbaad@gmail.com>wrote in message

Quote
Hi,



I have a requirement where in I need to parse and get all the HTML and

custom specific tags from the HTML page.



I'm using mshtml and vb.net. One way i thought off was to use the regex

functions. Can I know what is the best way to parse and fetch all the tags

or

if any one has worked on already can help me out so that I don't need to

reinvent the wheel.



Thanks in Advance.





Rgds,

--

Sindbaad





-

Re:Parsing HTML

"Sindbaad" <sindbaad@gmail.com>schrieb:

Quote
I have a requirement where in I need to parse and get all the HTML and

custom specific tags from the HTML page.



I'm using mshtml and vb.net.



What doesn't work with MSHTML?



Quote
One way i thought off was to use the regex

functions. Can I know what is the best way to parse and fetch all the tags

or

if any one has worked on already can help me out so that I don't need to

reinvent the wheel.



.NET Html Agility Pack: How to use malformed HTML just like it was

well-formed XML...

<URL:blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>">blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>



Download:



<URL:www.codefluent.com/smourier/download/htmlagilitypack.zip>">www.codefluent.com/smourier/download/htmlagilitypack.zip>



--

M S Herfried K. Wagner

M V P <URL:dotnet.mvps.org/>">dotnet.mvps.org/>

V B <URL:dotnet.mvps.org/dotnet/faqs/>">dotnet.mvps.org/dotnet/faqs/>



-

Re:Parsing HTML

Thanks for the update.



There are no problems with the MSHTML, I havn;t tried much with it. So I

just wanted to know further information on how to proceed on this.



The requirement is as below:

From the HTML page, I would to fetch all the custom tags in the order its

there in the HTML page.



eg:

<TR>

<TD class=smallText style="LINE-HEIGHT: 1.5" vAlign=top>

<FIRSTNAME>fstName</FIRSTNAME>

<LASTNAME>lstname</LASTNAME><BR>

<ADDRESS1>addr</ADDRESS1><BR>

</TD></TR>



From the above I need to fetch only Firstname, lastname, address1.



Any help is really appreciated.



Rgds,

Sindbaad



"Herfried K. Wagner [MVP]" wrote:



Quote
"Sindbaad" <sindbaad@gmail.com>schrieb:

>I have a requirement where in I need to parse and get all the HTML and

>custom specific tags from the HTML page.

>

>I'm using mshtml and vb.net.



What doesn't work with MSHTML?



>One way i thought off was to use the regex

>functions. Can I know what is the best way to parse and fetch all the tags

>or

>if any one has worked on already can help me out so that I don't need to

>reinvent the wheel.



..NET Html Agility Pack: How to use malformed HTML just like it was

well-formed XML...

<URL:blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>">blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>



Download:



<URL:www.codefluent.com/smourier/download/htmlagilitypack.zip>">www.codefluent.com/smourier/download/htmlagilitypack.zip>



--

M S Herfried K. Wagner

M V P <URL:dotnet.mvps.org/>">dotnet.mvps.org/>

V B <URL:dotnet.mvps.org/dotnet/faqs/>">dotnet.mvps.org/dotnet/faqs/>





-