Lesson 3

Character Matching

Introduction

In Lesson 2, we were looking for exact matches.

This can only take you so far, though. The fun starts when we introduce 'metacharacters'.

Metacharacters are characters that have a special meaning. These metacharacters are the real power of regular expressions, let's look at some of them...

Digits and Non-Digits

Match Any Digit

You can use \d to match any numerical digit.

The \ before the d is an 'escape character'. This tells the expression parser that that we're not trying to match the literal character d, but instead using a metacharacter.

This expression only matches when it sees two numerical digits in a row:

Try it out!

Genius is 1% inspiration, 99% perspiration

- Thomas Edison

Expression:>

Match Non-Digits

The opposite of \d is \D. Instead of matching only numerical digits, this matches anything except numerical digits.

The expression below matches all characters except for the numbers:

Try it out!

Genius is 1% inspiration, 99% perspiration

- Thomas Edison

Expression:>

The Dot Wildcard

The . is a wildcard, it will match any character (except for new-lines).

Be careful with this metacharacter. It can have unintended consequences, and it's often better to be more specific if you can.

The expression selects any 2 characters (..) that are followed by a % sign.

Try it out!

Genius is 1% inspiration, 99% perspiration

- Thomas Edison

Expression:>

This captures '99%', but is also accidentally selecting the space before '1%'.

This is because the expression /..%/ matches any two characters before a % symbol.. even that space.

If we want to literally match the . character, rather than using the wildcard metacharacter, we have to escape it by writing \.

Try it out!

Just. The. Dots.

-

Expression:>

Word and Non-Word Characters

Word Characters

The \w metacharacter allows us to match only 'word' characters.

'Word characters' means the letters A-Z (both upper and lower case), and the numbers 0-9.

Notice that this expression matches all of the characters in the words, but none of the spaces:

Try it out!

Genius is 1% inspiration, 99% perspiration

- Thomas Edison

Expression:>

Non-Word Characters

The \W metacharacter is the opposite - it selects all non-word characters.

That is; anything except for the numbers 0-9 and the letters A-Z (both upper and lower case).

Here we ignore all of the characters in the words, instead selecting the spaces and the % symbols:

Try it out!

Genius is 1% inspiration, 99% perspiration

- Thomas Edison

Expression:>

Whitespace

The \s metacharacter matches any whitespace (spaces, tabs, new lines etc.).

We use it here to select only the spaces between the words:

Try it out!

Genius is 1% inspiration, 99% perspiration

- Thomas Edison

Expression:>

Notice how this is different to the 'Non-Word Characters' example above?

The \W metacharacter selects anything that is not a number or a letter, which included the spaces but also selected other non-alphanumeric characters. The \s metacharacter only selects whitespace.

Mini-Game

These are the droids you are looking for!

Fix this expression so that it selects all of the the items on the left, but not the items on the right.

Items are only selected when their name is completely matched.

Select these:

  • BB-8

  • IG-11

But DON'T select these:

  • R2-D2

  • C-3PO


Your expression:

  • The correct items have two letters, followed by a - and then one or two numbers. You'll need the metacharacters from this lesson to represent that
  • You might also want to re-visit the 'optional' quantifier from the previous lesson 😉