Regular Expressions for Non-Programmers.

Useful knowledge when working with long texts.

This article is for those who don’t want to become an expert but get the maximum reward with minimum effort. The internet is full of tutorials on the topic and there are already lots of really good resources. I will list some of them at the bottom. The “problem” I see is that they are mostly not really targeted at non-technical users. They try to explain everything in one article. You read the first few paragraphs and think: “Well… Some other day, maybe.”

The goal of this article is that you can easily read it to the end, understand everything and go on with your life with an actual productivity gain. I will only cover a few handy things. Regular expressions can be very useful for anyone who is working with texts a lot and most editors support them, like all the popular office suites. I will use Google Docs for the examples in this article.

A Special Character

Everything behaves normally until we enable regular expressions. Suddenly not only the term “dog.” (with a dot at the end) is matched but also the first one, where there is no dot but a space. That’s because the dot has a special meaning in a regular expression. It’s like a placeholder that simply matches any character, even spaces and… yes, dots. Here’s another example:

In the end, we search for any combination of three characters of which the last one is a “t”. Note how it also matches “ght” in “caught” and even “ It” because of the space character it starts with.

This alone can already be quite useful in some situations but it certainly has its drawbacks. Most times, matching “anything” is not really what you want.

Other Special Characters

If you compare it to the dot and how it’s a placeholder for simply any character, you can say character sets are “custom placeholders” for only a few selected characters.

This whole [fcr] thing is now a placeholder for either an “f”, a “c” or an “r”. Combined with the “at” after it, this expression only matches exactly the three words “fat”, “cat” and “rat”. But, as you can see, also as part of other words. You will learn how to avoid that in a moment.

You can also define ranges of characters. To create a placeholder for any letter in the alphabet, you don’t need to write [abcdefghijklmnopqrstuvwxyz]. You can simply write [a-z]. For numbers it’s [0-9] and you can even combine them easily. [a-z0-9] is a placeholder for all letters and numbers and
[b-f1-6] is one for all letters from b to f and numbers from 1 to 6.

Oh, and… In the first screenshot of this article, you see how “Match case” isn’t enabled. Otherwise, [a-z] and [A-Z] wouldn’t be the same. And in that case, don’t try things like [A-z]. It doesn’t do what you might hope for. But you can use [a-zA-Z]… or just check that box.

I guess this last example makes it very clear what regular expressions are all about. You aren’t bound to exact words or phrases. You can search a text for complex patterns. And this example also demonstrates how powerful that can be. How else would you search for… times?

? * + (optional/repeating characters/placeholders)

Sorry? Oh, yes. Sure. Just put a question mark after that first placeholder, to make the leading “0" optional while the hour is less than 10.

Like asking yourself: “Is this really here… question mark”

Those parenthesis? Good catch. You can group stuff together so that the question mark applies to it as a whole.

Also handy: The plus sign and the star. [0-9]+ or [a-z]*
You can search for something that is there “at least one time” (plus) or “any number of times or not at all” (star). And if that is not enough, you can use { and } to say “two to four times”: [0-9]{2,4} or “at least three times”:
[a-z]{3,}.

\b (word boundary)

Now back to the “also as part of other words” problem.

The \b “helper” doesn’t really match any characters. It means “word ends here” or “word starts here”, depending on where you put it. If you put it on both sides, that means you are looking for a “whole word”.

Problem solved.

| (this or that)

The pipe character simply means “or”. You can basically search for multiple things at the same time.

Your “search options” can be as simple as single characters, like a|b, or more complex expressions. Let’s combine a few things here.

\ (escaping)

One last thing. So, there are special characters with a special meaning. By the way, these are all of them: .+*?()[{^$|\ That means you can’t just search for them literally. To do that you have to put a backslash in front of them. With that, we can fix the issue from the first example.

The end.

We will stop here. I want this article to be “digestible” but that’s a lot of handy stuff already I believe. You can search for whole words only or for words that start or end with something or for multiple words, alternative/common (mis)spellings, patterns like time and date and more. If you want to explore the rabbit hole a bit more, there are some useful resources below.

Other Resources

Awesome tools to build your own, more complex regular expressions. When hovering the expression field, it shows you what exactly is happening. They both also have a library of commonly used regular expressions which you can explore and try to make sense of.

Best quick start guide and cheat sheet but the visual style already scares you away. No need for regular expressions. Well… functional though.

Great talk. Requires some experience follow along.

16