PERL for Biologists

Course by Kurt Stüber

Previous, Part 6 ,Next

Regular expressions:

Regular expressions are a means to specify search strings used for comparisons, pattern recognitions and string and file parsing. We have seen them so far used in the split command. A regular expression is specified between two forward slashes:

/ABC/
/This\n/

In regular expressions all control codes (escape characters) can be used. You find a collection of these in the list of control codes. If you are looking for words in a text beginning with A you write:

/A\w*/

The control code \w will look for all word alphanumeric characters. The star (*) is a sign for repetition and will find all alphanumeric characters up to the end of the word. Further special search symbols are up-arrow (^) for the beginning of a line and the dollar ($) for the end of a line. Using these symbols you can bind word at the beginning of end of a line. A single dot (.) stand for any character.

$line = s/Karl/Fred/g;

This is a command to substitute parts of strings with other strings. In the statement above the String "Karl" is replaced with "Fred". The "s" stands for "substitution". The "g" stand for "global" and signifies that this replacement has to be done for every instance of "Karl" in the variable $line. If "g" is omitted only the first occurrence of "Karl" will be replaced by "Fred".

Exercises:

Write a program that replaces all lower case characters in file with uppercase ones.

Solutions:


© 2001-2007, by Kurt Stüber.