RegExp pattern  
Author Message
pdonoho





PostPosted: Thu Sep 16 09:10:21 CDT 2004 Top

VB Scripts >> RegExp pattern

Hello,

I'm looking deeper into RegExp world in order to try a solution to the
following problem, no way to solve it :-/ :

in a HTML string like :

<html>this <i>is my<b>beautiful</b> dog</i><crazy_tag> <li> but it can't
<help> <meto> solve my prob :( </html>

I would like to remove all HTML tags BUT keep <html> and </html> ones.
Finally, the goal is to define a list of "authorized tags" in a HTML
content and remove all the others. I can't even manage to pass the 1st
step, only keep the html one :-(

Your help is welcome !
Thx a lot !

Vince, who keeps on searching...

Visual Studio28  
 
 
Christoph





PostPosted: Thu Sep 16 09:10:21 CDT 2004 Top

VB Scripts >> RegExp pattern 16.09.2004 13:36, Vince schrieb:

> I'm looking deeper into RegExp world in order to try a solution to the
> following problem, no way to solve it :-/ :
>
> in a HTML string like :
>
> <html>this <i>is my<b>beautiful</b> dog</i><crazy_tag> <li> but it can't
> <help> <meto> solve my prob :( </html>

Using JScript, try this:

var r = /<\/?(?!html)\w+>/gi;

var str = "<html>this <i>is my<b>beautiful</b> dog</i><crazy_tag>"
+ "<li> but it can't <help> <meto> solve my prob :( </html>" ;

var stripped = str.replace(r,"");
WScript.Echo (stripped);

> I would like to remove all HTML tags BUT keep <html> and </html> ones.
> Finally, the goal is to define a list of "authorized tags" in a HTML
> content and remove all the others.

var r = /<\/?(?!html|head|meta|horse)\w+>/gi;

eliminates all but <html>, <head>, <meta>, <horse>
plus there closing counterparts.

--
Gruesse, Christoph

Rio Riay Riayo - Gordon Sumner, 1979
 
 
Vince





PostPosted: Thu Sep 16 10:01:15 CDT 2004 Top

VB Scripts >> RegExp pattern
> Hello,
>
> I'm looking deeper into RegExp world in order to try a solution to the
> following problem, no way to solve it :-/ :
>
> in a HTML string like :
>
> <html>this <i>is my<b>beautiful</b> dog</i><crazy_tag> <li> but it can't
> <help> <meto> solve my prob :( </html>
>
> I would like to remove all HTML tags BUT keep <html> and </html> ones.
> Finally, the goal is to define a list of "authorized tags" in a HTML
> content and remove all the others. I can't even manage to pass the 1st
> step, only keep the html one :-(
>
> Your help is welcome !
> Thx a lot !
>
> Vince, who keeps on searching...

For those interrested, here is the link on the answer :

http://www.developersdex.com/asp/message.asp?r=4542061&p=1825

Vince.