Lesson 4

Setting Boundaries


In the Lesson 3 we learnt about 'metacharacters'- characters that have a special meaning in the expression.

'Boundaries' and 'Anchors' are metacharacters that allow you to 'find' parts of the string.

They are incredibly useful and you'll find yourself using them a lot.

Let's take a look...

Word Boundaries

'Word boundaries' occur whenever the text changes from a non-word character to a word character, or vice-versa. Typically, you would use this to find the start or end of words.

These are represented in a regular expression with the \b metacharacter.

The expression below finds the letter a, but only when it is the first character in the word. Remove the word boundary and see what happens:

Try it out!

Life is either a daring adventure or nothing at all

- Helen Keller


In the expression above, only the a is selected, the 'word boundary' metacharacter itself doesn't actually select any characters - it just tell the regular expression engine where to look for matches.

Word boundaries can be confusing though, as they are also found when there are special characters inside a word.

The word boundaries in the text below are marked in yellow. Try changing the string, adding special characters etc., to get a feel for what a 'word boundary' actually is:

Try it out!



We use Anchors to 'pin' an expression to the start or end of a line.

Start of a string

The ^ character represents the start of the string.

This expression matches the letter o, only when it occurs at the start of the string (^).

Try removing the anchor character to see what we mean:

Try it out!

One, two, three, four, five

Once I caught a fish alive


We can describe this expression as being 'anchored' to the start of the string.

This expression uses the i modifier from Lesson 2, which makes the expression case-insensitive - it will match both upper and lower-case o characters.

End of the string

The $ anchors the expression to the end of the string.

This matches any character (.) that is immediately followed by the end of the string ($):

Try it out!

One, two, three, four, five

Once I caught a fish alive


We say that this expression is 'anchored' to the end of the string.

Multi-line strings

If we have a string with multiple lines, we can use these anchors to match the beginning or end of each line.

By adding the /m modifier to the expression, the ^ will now match the start of each line, while the $ matches the end of each line.

Try it out!

This is line 1 This is line2 This is line3




Sentiment Analysis!

Select ONLY the tweets that contain the word 'bad'. Matching a single word in the tweet will select it.

Select these:

Trolly McTrollFace


Hey @baseclass, how can you be this BAD at stuff!?

Grumpy Customer


The @baseclass app just crashed on me, this app is so bad it hurts!

But DON'T select these:

    Average Joe


    Just earnt the 'super contributor' badge in the @baseclass app!!

    Happy Customer


    I can berely contain my excitement about the @baseclass app.

Your expression:

  • We need to capture both upper and lower case 'bad's. You'll need a modifier from a previous lesson for this.
  • To avoid accidentally matching tweets with words that contain the text 'bad' (e.g. 'badge'), thing about adding word-boundary