Appeal for help, because I’m terrible at regex

One of the most annoying things about using Google Docs is that none of the styles are inline. It used to be that bold text was wrapped in a <strong> tag and italic text was wrapped in am <em> tag. No longer. Now each style of text is wrapped in a span with a number of different classes applied to it. Those styles don’t carry through when we bring the text into WordPress and the names of the classes vary from article to article. This can be very annoying for columnists who bold names of subjects, for example.

So, what I’m looking for is a regex expression to turn <span class=”c0 c3″>My text</span> into <span class=”c0 c3″><strong>My text</strong></span> where class c3 is the bold class, for example.

11 thoughts on “Appeal for help, because I’m terrible at regex”

  1. Parsing HTML with regular expressions is really difficult (and Wrong), especially in this case where you then need to do things like matching tags that could potentially be nested.

    If you ignore this situation: <span class="c0 c3">My <span>text</span>is bold </span> then the regular expression is simple:

    $content = preg_replace( '#<span class="c0 c3">(.*?)</span>#s, '<span class="c0 c3"><strong>$1</strong></span>', $content );

    Note the “s” pattern modifier being used to ensure that “.” matches line breaks between opening and closing span tags: And the ? makes it non-greedy that way we stop at the very next closing span tag.

    1. The problem is that the the classes change name and there’s not always the same number of classes. So where I’m having trouble is finding a variable class name in a list of a variable number of classes.

      For example, it could be <span class=”c0 c3″>, or it could be <span class=”c0″>, or it could be <span class=”c0 c3 c1″>.

      Luckily spans are never nested.

      So maybe something like(?):
      $content = preg_replace( ‘#<span class=”(.*?)’ . $boldclass . ‘(.*?)”>(.*?)</span>#s, ‘<span class=”$1$boldclass$2″><strong>$3</strong></span>’, $content );

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>