Extended Regular Expressions
Extended regular expressions (EREs) are a variant of regular expressions that support additional features and syntax. EREs are supported by several command line utilities in Linux, including grep
, sed
, and awk
.
Here are a few of the main differences between basic regular expressions (BREs) and EREs:
Repetition: In EREs, you can use the ?
, +
, and {}
operators to specify the number of repetitions of the preceding character or expression. For example:
Repetition Operator | Description |
---|---|
? |
Matches zero or one occurrence of the preceding character or expression |
+ |
Matches one or more occurrences of the preceding character or expression |
{n} |
Matches exactly n occurrences of the preceding character or expression |
{n,} |
Matches n or more occurrences of the preceding character or expression |
{n,m} |
Matches at least n and at most m occurrences of the preceding character or expression |
syntax to match a character class, such as alphabetic characters, digits, whitespace, etc. For example:
Character Class | Description |
---|---|
[[:alpha:]] |
Matches any alphabetic character |
[[:digit:]] |
Matches any digit |
[[:space:]] |
Matches any whitespace character |
Word boundaries: In EREs, you can use the b
and B
operators to match word boundaries. For example:
Word Boundary Operator | Description |
---|---|
\b |
Matches a word boundary |
\B |
Matches a non-word boundary |
To use EREs with a command line utility, you may need to use a command line option to enable ERE support. For example, to use EREs with grep
, you can use the -E
option:
$ grep -E 'pattern' file
Zero or one occurrence of a character using (?)
In extended regular expressions (EREs), the ?
operator is used to match zero or one occurrence of the preceding character or expression.
For example, the regular expression a?
would match either the string “a” or the empty string “”.
Here are a few more examples of how the ?
operator can be used in EREs:
[0-9]?
: This regular expression would match any string that contains either no digits or a single digit
^A.?B$
: This regular expression would match any string that starts with the letter “A”, followed by zero or one characters, and ends with the letter “B” (e.g. “AB”, “A B”)
cat.?dog
: This regular expression would match any string that contains the word “cat” followed by zero or one characters, followed by the word “dog” (e.g. “catdog”, “cat dog”)
You can also use the ?
operator in combination with other metacharacters and operators. For example:
^A.?B|C$
: This regular expression would match any string that starts with the letter “A”, followed by zero or one characters, and ends with the letter “B”, or any string that ends with the letter “C”
^[0-9]*.?[0-9]*$
: This regular expression would match any string that consists of zero or more digits, followed by an optional decimal point, followed by zero or more digits (e.g. “123”, “123.456”, “0.0”, etc.)
One or more occurrence of a character using (+)
In extended regular expressions (EREs), the +
operator is used to match one or more occurrences of the preceding character or expression.
For example, the regular expression a+
would match any string that contains one or more occurrences of the letter “a”, such as “a”, “aa”, “aaa”, etc.
Here are a few more examples of how the +
operator can be used in EREs:
[0-9]+
: This regular expression would match any string that contains one or more digits
^A.+B$
: This regular expression would match any string that starts with the letter “A”, followed by one or more characters, and ends with the letter “B” (e.g. “AB”, “A BBB”)
cat.+dog
: This regular expression would match any string that contains the word “cat” followed by one or more characters, followed by the word “dog” (e.g. “catdog”, “cat dogcat dog”)
You can also use the +
operator in combination with other metacharacters and operators. For example:
^A.+B|C$
: This regular expression would match any string that starts with the letter “A”, followed by one or more characters, and ends with the letter “B”, or any string that ends with the letter “C”
^[0-9]*.?[0-9]+$
: This regular expression would match any string that consists of zero or more digits, followed by an optional decimal point, followed by one or more digits (e.g. “123.456”, “0.123”, etc.)
Match exactly n occurrences of character using {n}
the {n}
operator is used to match exactly n
occurrences of the preceding character or expression.
For example, the regular expression a{3}
would match any string that contains exactly three occurrences of the letter “a”, such as “aaa”.
Here are a few more examples of how the {n}
operator can be used in EREs:
[0-9]{3}
: This regular expression would match any string that contains exactly three digits
^A.{3}B$
: This regular expression would match any string that starts with the letter “A”, followed by exactly three characters, and ends with the letter “B” (e.g. “A BBB”)
cat.{3}dog
: This regular expression would match any string that contains the word “cat” followed by exactly three characters, followed by the word “dog” (e.g. “cat dogcat dog”)
You can also use the {n}
operator in combination with other metacharacters and operators. For example:
^A.{3}B|C$
: This regular expression would match any string that starts with the letter “A”, followed by exactly three characters, and ends with the letter “B”, or any string that ends with the letter “C”
^[0-9]*.?[0-9]{3}$
: This regular expression would match any string that consists of zero or more digits, followed by an optional decimal point, followed by exactly three digits (e.g. “123.456”, “0.123”, etc.)
Match n or more occurrences of character using {n,}
the {n,}
operator is used to match n
or more occurrences of the preceding character or expression.
For example, the regular expression a{3,}
would match any string that contains three or more occurrences of the letter “a”, such as “aaa”, “aaaa”, “aaaaa”, etc.
Here are a few more examples of how the {n,}
operator can be used in EREs:
[0-9]{3,}
: This regular expression would match any string that contains three or more digits
^A.{3,}B$
: This regular expression would match any string that starts with the letter “A”, followed by three or more characters, and ends with the letter “B” (e.g. “A BBB”, “A BBBBBBB”)
cat.{3,}dog
: This regular expression would match any string that contains the word “cat” followed by three or more characters, followed by the word “dog” (e.g. “cat dogcat dog”, “cat dogcat dogcat dog”)
You can also use the {n,}
operator in combination with other metacharacters and operators. For example:
^A.{3,}B|C$
: This regular expression would match any string that starts with the letter “A”, followed by three or more characters, and ends with the letter “B”, or any string that ends with the letter “C”
^[0-9]*.?[0-9]{3,}$
: This regular expression would match any string that consists of zero or more digits, followed by an optional decimal point, followed by three or more digits (e.g. “123.456”, “0.123456”, etc.)
Match at least n
and at most m
occurrences of a character {n,m}
the {n,m}
operator is used to match at least n
and at most m
occurrences of the preceding character or expression.
For example, the regular expression a{3,5}
would match any string that contains three, four, or five occurrences of the letter “a”, such as “aaa”, “aaaa”, “aaaaa”.
Here are a few more examples of how the {n,m}
operator can be used in EREs:
[0-9]{3,5}
: This regular expression would match any string that contains three, four, or five digits
^A.{3,5}B$
: This regular expression would match any string that starts with the letter “A”, followed by three, four, or five characters, and ends with the letter “B” (e.g. “A BBB”, “A BBBB”, “A BBBBB”)
cat.{3,5}dog
: This regular expression would match any string that contains the word “cat” followed by three, four, or five characters, followed by the word “dog” (e.g. “cat dogcat dog”, “cat dogcat dogcat dog”, “cat dogcat dogcat dogcat dog”)
You can also use the {n,m}
operator in combination with other metacharacters and operators. For example:
^A.{3,5}B|C$
: This regular expression would match any string that starts with the letter “A”, followed by three, four, or five characters, and ends with the letter “B”, or any string that ends with the letter “C”
^[0-9]*.?[0-9]{3,5}$
: This regular expression would match any string that consists of zero or more digits, followed by an optional decimal point, followed by three, four, or five digits (e.g. “123.456”, “0.123456”, etc.)
match any alphabetic character using [[:alpha:]]
the [[:alpha:]]
character class is used to match any alphabetic character.
For example, the regular expression [[:alpha:]]
would match any single letter, either uppercase or lowercase.
Here are a few more examples of how the [[:alpha:]]
character class can be used in EREs:
[[:alpha:]]+
: This regular expression would match any string that consists of one or more letters (uppercase or lowercase)
^[[:alpha:]]+$
: This regular expression would match any string that consists only of letters (uppercase or lowercase)
^[[:alpha:]]+\.txt$
: This regular expression would match any string that consists of one or more letters (uppercase or lowercase) followed by the “.txt” extension
You can also use the [[:alpha:]]
character class in combination with other metacharacters and operators. For example:
^[[:alpha:]]+\.?[[:alpha:]]*$
: This regular expression would match any string that consists of one or more letters (uppercase or lowercase), followed by an optional period, followed by zero or more letters (uppercase or lowercase)
^[[:alpha:]]+\d+$
: This regular expression would match any string that consists of one or more letters (uppercase or lowercase), followed by one or more digits
match any digit using [[:digit:]]
the [[:digit:]]
character class is used to match any digit.
For example, the regular expression [[:digit:]]
would match any single digit (0-9).
Here are a few more examples of how the [[:digit:]]
character class can be used in EREs:
[[:digit:]]+
: This regular expression would match any string that consists of one or more digits
^[[:digit:]]+$
: This regular expression would match any string that consists only of digits
^[[:digit:]]+\.txt$
: This regular expression would match any string that consists of one or more digits followed by the “.txt” extension
You can also use the [[:digit:]]
character class in combination with other metacharacters and operators. For example:
^[[:digit:]]+\.?[[:digit:]]*$
: This regular expression would match any string that consists of one or more digits, followed by an optional period, followed by zero or more digits
^[[:alpha:]]+[[:digit:]]+$
: This regular expression would match any string that consists of one or more letters (uppercase or lowercase), followed by one or more digits
Match any whitespace character using [[:space:]]
the [[:space:]]
character class is used to match any whitespace character.
Whitespace characters include spaces, tabs, newlines, and other characters that are used to separate words or lines of text.
For example, the regular expression [[:space:]]
would match any single whitespace character.
Here are a few more examples of how the [[:space:]]
character class can be used in EREs:
[[:space:]]+
: This regular expression would match any string that consists of one or more whitespace characters
^[[:space:]]+$
: This regular expression would match any string that consists only of whitespace characters
^[[:alpha:]]+[[:space:]]+[[:alpha:]]+$
: This regular expression would match any string that consists of one or more letters (uppercase or lowercase), followed by one or more whitespace characters, followed by one or more letters (uppercase or lowercase)
You can also use the [[:space:]]
character class in combination with other metacharacters and operators. For example:
^[[:digit:]]+[[:space:]]+[[:alpha:]]+$
: This regular expression would match any string that consists of one or more digits, followed by one or more whitespace characters, followed by one or more letters (uppercase or lowercase)
^[[:space:]]*[[:alpha:]]+[[:space:]]*$
: This regular expression would match any string that consists of zero or more whitespace characters, followed by one or more letters (uppercase or lowercase), followed by zero or more whitespace characters
Word boundaries in regular expressions
the \b
operator is used to match word boundaries. A word boundary is a position in a string where a word starts or ends.
For example, the regular expression \bcat\b
would match the word “cat” in a string, but would not match the string “catdog” because “cat” and “dog” are not separate words.
Here are a few more examples of how the \b
operator can be used in EREs:
\b[A-Z]\b
: This regular expression would match any single uppercase letter that is surrounded by word boundaries (e.g. “A” in “An apple a day keeps the doctor away”)
\bcat|dog\b
: This regular expression would match the words “cat” or “dog” when they appear as separate words (e.g. “I have a cat” or “My dog is sleeping”)
\bcat.*\bdog\b
: This regular expression would match any string that contains the word “cat” followed by zero or more characters, followed by the word “dog” as separate words (e.g. “The cat chased the dog” but not “The cat chased the dogcat”)
You can also use the \b
operator in combination with other metacharacters and operators. For example:
\bcat\b|\bdog\b
: This regular expression would match either the word “cat” or the word “dog” when they appear as separate words
\b[A-Z][a-z]*\b
: This regular expression would match any uppercase letter followed by zero or more lowercase letters, surrounded by word boundaries (e.g. “Apple” in “An apple a day keeps the doctor away”)