Can I Use C++ To Process HTML?  
Author Message
vicarious





PostPosted: Visual C++ General, Can I Use C++ To Process HTML? Top

I really don't know any of the commands that deal with HTML in c++. Can anyone
tell me if this is even possible

Ok, say I know the URL of an HTML file from a website. I'll put this URL in the pointer URL.
Now is there a way that C++ can get this HTML file that contains the code for a web page from it's URL Like if the URL was microsoft.com, could it grab the HTML document that creates that page If that is possible could it search through the HTML file for a specific string

If this is possible, can someone please show me an example

thanks, Mark


Visual C++13  
 
 
Lord Zoltan





PostPosted: Visual C++ General, Can I Use C++ To Process HTML? Top

You can do it a number of ways.

At the lowest level, you can manually open a socket with winsock to the web server on port 80 and issue http get and post commands and the response string from the server will be the resulting html.

However, that's far too low-level for what you want. Looking through, it appears that CInternetSession and CInternetSession::OpenURL might be the best thing for you to use. That's if you are happy to use MFC.

Lookup the CInternetSession class on the msdn help, it's got a full explanation of the class and examples on how to use the various functions etc.

Hope this helps!


 
 
Sheng Jiang





PostPosted: Visual C++ General, Can I Use C++ To Process HTML? Top

If you just need to download a file from the Internet, you can use URLDownloadToFile. After that, you can use regular expressions to parse the file.

 
 
Simple Samples





PostPosted: Visual C++ General, Can I Use C++ To Process HTML? Top

As said previously, there are multiple solutions. I forget what most of them are. You should ask this question in an IE programming newsgroup; it is outside the scope of this forum. It is beneficial for you to ask in a forum in which people familiar with these questions will help you.

There are relevant samples provided with VC and/or the Platform SDK.

If you want to get the raw HTML file exactly as it looks, then you need to use something less common; I forget what it is. If you use the most common solution, then you will get stuff such as the images that are in the page, instead of the HTML that references the image. After downloading the file, you can use various functions to process the HTML. You should ask in a newsgroup or web site in which this question is relevant; the suggestion to parse the HTML using regular expressions is not likely to be suggested by anyone familiar with the extensive capabilities available to process HTML.