|
|
 |
Author |
Message |
pdonoho

|
Posted: Thu Sep 16 09:10:21 CDT 2004 |
Top |
VB Scripts >> RegExp pattern
Hello,
I'm looking deeper into RegExp world in order to try a solution to the
following problem, no way to solve it :-/ :
in a HTML string like :
<html>this <i>is my<b>beautiful</b> dog</i><crazy_tag> <li> but it can't
<help> <meto> solve my prob :( </html>
I would like to remove all HTML tags BUT keep <html> and </html> ones.
Finally, the goal is to define a list of "authorized tags" in a HTML
content and remove all the others. I can't even manage to pass the 1st
step, only keep the html one :-(
Your help is welcome !
Thx a lot !
Vince, who keeps on searching...
Visual Studio28
|
|
|
|
 |
Christoph

|
Posted: Thu Sep 16 09:10:21 CDT 2004 |
Top |
VB Scripts >> RegExp pattern
16.09.2004 13:36, Vince schrieb:
> I'm looking deeper into RegExp world in order to try a solution to the
> following problem, no way to solve it :-/ :
>
> in a HTML string like :
>
> <html>this <i>is my<b>beautiful</b> dog</i><crazy_tag> <li> but it can't
> <help> <meto> solve my prob :( </html>
Using JScript, try this:
var r = /<\/?(?!html)\w+>/gi;
var str = "<html>this <i>is my<b>beautiful</b> dog</i><crazy_tag>"
+ "<li> but it can't <help> <meto> solve my prob :( </html>" ;
var stripped = str.replace(r,"");
WScript.Echo (stripped);
> I would like to remove all HTML tags BUT keep <html> and </html> ones.
> Finally, the goal is to define a list of "authorized tags" in a HTML
> content and remove all the others.
var r = /<\/?(?!html|head|meta|horse)\w+>/gi;
eliminates all but <html>, <head>, <meta>, <horse>
plus there closing counterparts.
--
Gruesse, Christoph
Rio Riay Riayo - Gordon Sumner, 1979
|
|
|
|
 |
Vince

|
Posted: Thu Sep 16 10:01:15 CDT 2004 |
Top |
VB Scripts >> RegExp pattern
> Hello,
>
> I'm looking deeper into RegExp world in order to try a solution to the
> following problem, no way to solve it :-/ :
>
> in a HTML string like :
>
> <html>this <i>is my<b>beautiful</b> dog</i><crazy_tag> <li> but it can't
> <help> <meto> solve my prob :( </html>
>
> I would like to remove all HTML tags BUT keep <html> and </html> ones.
> Finally, the goal is to define a list of "authorized tags" in a HTML
> content and remove all the others. I can't even manage to pass the 1st
> step, only keep the html one :-(
>
> Your help is welcome !
> Thx a lot !
>
> Vince, who keeps on searching...
For those interrested, here is the link on the answer :
http://www.developersdex.com/asp/message.asp?r=4542061&p=1825
Vince.
|
|
|
|
 |
|
|