HOME  |    TRAINING  |   FREE TUTORIALS   |   JOBS
Find out more about our new RSS feed.
FREE Tutorial
BEGINNING PERL PART 2 - ESCAPING SPECIAL CHARACTERS

CATEGORY
SEARCH OUR OTHER TUTORIALS

DESCRIPTION

Of course, regular expressions can be more than just words and spaces. The rest of this chapter is going to be about the various ways we can specify more advanced matches - where portions of the match are allowed to be one of a number of characters, or where the match must occur at a certain position in the string. To do this, we'll be describing the special meanings given to certain characters - called metacharacters - and look at what these meanings are and what sort of things we can express with them.
Click here to be kept informed of our new Tutorials.


This free tutorial is a sample from the book Beginning Perl.


At this stage, we might not want to use their special meanings - we may want to literally match the characters themselves. As you've already seen with double-quoted strings, we can use a backslash to escape these characters' special meanings. Hence, if you want to match '... ' in the above text, you need your pattern to say '\.\.\. '. For example:

> perl matchtest.plx
Enter some text to find: Ent+
The text matches the pattern 'Ent+'.

> perl matchtest.plx
Enter some text to find: Ent\+
'Ent\+' was not found.

We'll see later why the first one matched - due to the special meaning of +.

These are the characters that are given special meaning within a regular expression, which you will need to backslash if you want to use literally:. * ? + [ ] ( ) { } ^ $ | \ Any other characters automatically assume their literal meanings.

You can also turn off the special meanings using the escape sequence \Q . After perl sees \Q , the 14 special characters above will automatically assume their ordinary, literal meanings. This remains the case until perl sees either \E or the end of the pattern.

For instance, if we wanted to adapt our matchtest program just to look for literal strings, instead of regular expressions, we could change it to look like this:

if (/\Q$pattern\E/) { 

Now the meaning of + is turned off:

> perl matchtest.plx
Enter some text to find: Ent+
'Ent+' was not found.
> 

Note that all \Q does is turn off the regular expression magic of those 14 characters above - it doesn't stop, for example, variable interpolation.

Don't forget to change this back again: We'll be using matchtest.plx throughout the chapter, to demonstrate the regular expressions we look at. We'll need that magic fully functional!

Anchors

So far, our patterns have all tried to find a match anywhere in the string. The first way we'll extend our regular expressions is by dictating to perl where the match must occur. We can say 'these characters must match the beginning of the string' or 'this text must be at the end of the string'. We do this by anchor ing the match to either end.

The two anchors we have are ^ , which appears at the beginning of the pattern anchor a match to the beginning of the string, and $ which appears at the end of the pattern and anchors it to the end of the string. So, to see if our quotation ends in a full stop - and remember that the full stop is a special character - we say something like this:

>perl matchtest.plx
Enter some text to find: \.$
The text matches the pattern '\.$'.

That's a full stop (which we've escaped to prevent it being treated as a special character) and a dollar sign at the end of our pattern - to show that this must be the end of the string.

Try, if you can, to get into the habit of reading out regular expressions in English. Break them into pieces and say what each piece does. Also remember to say that each piece must immediately follow the other in the string in order to match. For instance, the above could be read 'match a full stop immediately followed by the end of the string'.

If you can get into this habit, you'll find that reading and understanding regular expressions becomes a lot easier, and you'll be able to 'translate' back into Perl more naturally as well.

Here's another example: do we have a capital I at the beginning of the string?

> perl matchtest.plx
Enter some text to find: ^I
'^I' was not found.
>

We use ^ to mean 'beginning of the string', followed by an I. In our case, though, the character at the beginning of the string is a " , so our pattern does not match. If you know that what you're looking for can only occur at the beginning or the end of the match, it's extremely efficient to use anchors. Instead of searching through the whole string to see whether the match succeeded, perl only needs to look at a small portion and can give up immediately if even the first character does not match.

Let's see one more example of this, where we'll combine looking for matches with looking through the lines in a file:

Try it out : Rhyming Dictionary

Imagine yourself as a poor poet. In fact, not just poor, but downright bad - so bad, you can't even think of a rhyme for 'pink'. So, what do you do? You do what every sensible poet does in this situation, and you write the following Perl program:

#!/usr/bin/perl
# rhyming.plx
use warnings;
use strict;
my $syllable = "ink";
while (<>) {
print if /$syllable$/;
} 

We can now feed it a file of words, and find those that end in 'ink':

>perl rhyming.plx wordlist.txt
blink
bobolink
brink
chink
clink
>

For a really thorough result, you'll need to use a file containing every word in the dictionary - be prepared to wait though if you do! For the sake of the example however, any text-based file will do (though it'll help if it's in English). A bobolink, in case you're wondering, is a migratory American songbird, otherwise known as a ricebird or reedbird.

How It Works

With the loops and tests we learned in the last chapter, this program is really very easy:

while (<>) { print if /$syllable$/;} 

We've not looked at file access yet, so you may not be familiar with the while(<>){...} construction used here. In this example it opens a file that's been specified on the command line, and loops through it, one line at a time, feeding each one into the special variable $_ - this is what we'll be matching.

Once each line of the file has been fed into $_ , we test to see if it matches the pattern, which is our syllable, 'ink', anchored to the end of the line (with $ ). If so, we print it out.

The important thing to note here is that perl treats the 'ink' as the last thing on the line, even though there is a new line at the end of $_ . Regular expressions typically ignore the last new line in a string - we'll look at this behavior in more detail later.

Continued...


NEXT PAGE



5 RELATED COURSES AVAILABLE
MICROSOFT VISUAL BASIC V6 INTRODUCTION
To go from the fundamentals of Visual Basic programming to the threshold of Advanced level. Gaining in depth prog....
MICROSOFT VISUAL BASIC 5.0 PROFESSIONAL INTRODUCTION
To provide readers with a solid foundation upon which to build Windows applications using Visual Basic 5. Readers....
MICROSOFT VISUAL BASIC 5.0 CLIENT SERVER DEVELOPMENT
This course teaches the skills required to develop client server applications using MS Visual Basic 5.0 Enterpris....
C++ PROGRAMMING
Object oriented programming is fast becoming the leading software design methodology, with C++ becoming ever more....
C PROGRAMMING
This course is design to provide non-C programmers with the essential skills and knowledge necessary to allow the....
 
0 RELATED JOBS AVAILABLE
CONTACT US
Thursday 4th December 2008  © COPYRIGHT 2008 - VISUALSOFT