regex - PHP Remove all paragraph tags inside header tags

Saturday, 15 July 2017

regex - PHP Remove all paragraph tags inside header tags

I've been dwelling on this for a while.

I have this string (there are more contents before and after the h2 tags):

...Lorem Ipsum
...

What regex do I use to remove all the

and

tags inside those header tags?

I'm trying to do something like this, but the positive lookbehind one is not working:

// for the starting  tag
$str = preg_replace('/(?<=]+>)\s*
/i', '', $str);
// for the ending 
 tag
$str = preg_replace('/<\/p>\s*(?=<\/h[1-6]{1}>\s*)/i', '', $str);

This does not take account paragraph tags deep inside the text within the

tag also

[Update]

This is derived from one of PeeHaa's suggested links

// for the starting  tag
$str = preg_replace("#()#", '$1', $str);
// for the ending 
 tag
$str = preg_replace("#<\/p>(<\/h[1-6]>)#", '$1', $str);

Answer

You shouldn't try parse html with regexes, though having said that, since this is a subset of html and not a full document / nested layout, it is possible:

preg_replace('/(]*>)\s?(.*)?<\/p>\s?(<\/h\2>)/', "$1$3$4")

Test case here:

http://codepad.org/oA2rtNP9

Blog

Saturday, 15 July 2017