RegEx match content unless it is inside quotes

I recently had the need to match content outside quotes, but avoid content inside quotes.  For example, I have content like so:

this is content to replace, “but don’t replace this string, it’s “important””, but feel free to replace this.

The answer came from http://thisworkinglife.blogspot.com/2006/04/net-regular-expressions-finding.html supported by http://regexadvice.com/forums/thread/36369.aspx and http://www.regular-expressions.info/lookaround.html and http://aspnet.4guysfromrolla.com/articles/022603-1.aspx

The expression is added to the beginning of any other expression.  Here it is:

(?<=^(([^”]*(?<!\\)”[^”]*(?<!\\)”[^”]*)*|[^”]*))

It means: “… and before it is an even number of double-quotes not preceeded by a or no quotes at all before it …”

If you’ve never used look-ahead or look-behind expressions, they’re incredibly cool.  I understand they’re not as performant as normal expressions, but definitely do the job.  The nice part about them is they’re not considered part of the match in any way, so no need to capture this non-important content and replace it back into the modified string.

In time, I’d like to expand it to encompass single-quoted text too, but that got weird.  Do I need to escape a ‘ inside a ‘ but not a “?  If there’s a ‘ inside a ” is it ok?  What if it’s a ‘ that doesn’t have white space around it such as “can’t” or “don’t”?  Do I escape a ‘ with a or with a ‘?  For this purpose, assuming the content in question was in double-quotes was sufficient, and made the regex much simpler.

Rob

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>