Sunday, 11 February 2018

php - Regex - Convert HTML to valid XML tag






I need help writing a regex function that converts HTML string to a valid XML tag name. Ex: It takes a string and does the following:





  • If an alphabet or underscore occurs in the string, it keeps it

  • If any other character occurs, it's removed from the output string.

  • If any other character occurs between words or letters, it's replaced with an Underscore.




Ex:
Input: Date Created
Ouput: Date_Created


Input: Date
Created
Output: Date_Created

Input: Date\nCreated
Output: Date_Created

Input: Date 1 2 3 Created
Output: Date_Created




Basically the regex function should convert the HTML string to a valid XML tag.


Answer



A bit of regex and a bit of standard functions:



function mystrip($s)
{
// add spaces around angle brackets to separate tag-like parts
// e.g. "
" becomes "
"

// then let strip_tags take care of removing html tags
$s = strip_tags(str_replace(array('<', '>'), array(' <', '> '), $s));

// any sequence of characters that are not alphabet or underscore
// gets replaced by a single underscore
return preg_replace('/[^a-z_]+/i', '_', $s);
}

No comments:

Post a Comment

casting - Why wasn&#39;t Tobey Maguire in The Amazing Spider-Man? - Movies &amp; TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...