About regular expressions (Basic & Extended)

bash shell


BRE and ERE are always useful not to say mandatory when using bash or any commons tools / programming language on a Linux box (such as awk, grep, sed, perl etc). In addition to the BRE and ERE REGEX you may find the PCRE (Perl Compatible Regular Expressions) on shell tools such as grep with the -P option set. PCRE are not covered in this page, but please note that they are well known, for their power and therefore their complexity, this complexity may also brings its share of mess! Also, PCRE are faster than POSIX REGEX but less portable. Keep this in mind when you will have to choose your weapons.

General philosophy of REGEX:

Regular expressions are constructed the same way arithmetic expressions are: by using various operators to combine smaller expressions.


1 REGEX Generalities

1.1 Terminology

In REGEX field we find two types of characters :

  1. literal character
    Literal means “without special meaning or function“, a literal character is often a normal text character such as “a”, “b”, “1”, “2”, “!”, “:” etc.
  2. meta-character (special character)
    A meta-character is a character with a special meaning, a character that is not what it looks like such as “*”, “.”, “$” etc.
  3. The target string which represent what we try to find.
  4. Finally the regular expression is the “search pattern” that is used to find the target string. It may be exactly the same as the target string or it may include some of the regular expression functionality discussed next.


1.2 meta-character

Meta-characters may be divided as follows.

  • Those matching a single character
    1. .” (dot) matches “any ONE character“.
    2. […]” (character class or POSIX bracket expression) matches “any one character listed in brackets“, this may contains a list or a range using the ” character.
      Note : it is important to keep in mind that character class will match only ONE character.
    3. [^…]” (negated character class) matches “any character NOT listed in brackets“.
    4. \char” (escaped meta-character) matches “the character after the backslash literally“.


  • Those matching a position (AKA anchors)
    1. ^” (caret) matches “the start of a line“.
    2. $” (dollar) matches “the end of a line“.
    3. \<” (backslash less than) matches “start of a word“.
    4. \>” (backslash greater than) matches “end of a word“.


  • The quantifiers(which modify the numbers of occurrence of the preceding expression).
    1. ?” the “optional” character, matches the “preceding expression or its absence“.
    2. *” matches “any numbers“, including zero.
    3. +” matches “1 or more” of the preceding expression.
    4. {N}” matches “exactly N times“.
    5. {N,}” matches “at least N times“.
    6. {MIN,MAX}” matches “between MIN and MAX times“.


  • Finally the others
    1. |” matches “either expression given” called “alternation“.
    2. ” indicates a “range“.
    3. (…)” used to “limit the scope” of an alternation.
    4. \1,\2, …” (back references) matches “previously matched pattern within parentheses“.
    5. \b” matches “batches character” that typically mark the end of a word (space, period, etc.)
    6. \B” matches “literal backslash character” this is an alternative to the use of “\\” to match a literal backslash, good for readability.
    7. \w” matches “any word character” (any letter, number AND the underscore character)
    8. \W” matches “any NON word character” (any NOT-A-letter, NOT-A-number AND the NOT-An-underscore character)




2 Basic Vs Extended

Basic and Extended regular expressions has a lot in common but also some differences, knowing this is useful when comes the time to choose (see this table for an overview).

2.1 Basics Regular Expressions (BRE)

  • BRE comes with the following meta-characters (use the “\” to remove the special meaning from those)
    1. Character classes or POSIX bracket expressions. We are talking here of the general Character class (in opposite to the POSIX Character class)
      e.g : [a-t], will match any characters between letter a and t according to your locale
    2. POSIX Character classes (This is the POSIX Character class e.g: [[:digit:]], which matches any digit character)
      Note : Don’t confuse the POSIX term “character class” with what is normally called a “regular expression character class”. [x-z0-9] is an example of what we call a “character class” and POSIX calls a “bracket expression“. [:digit:] is a POSIX character class, used inside a bracket expression like [x-z[:digit:]].
      To make a long story short: “POSIX bracket expression” is strictly the same as “regular expression character class BUT “POSIX character class” is different and is to be used inside of a regular expression character class notation (“[…]”).
    3. .
    4. ^
    5. $
    6. *


  • Then the following needs the backslash to activate their special meaning (!)
    1. \?1
    2. \+1
    3. \{\}
    4. \(\)” for back references (up to 9, called as \1 … \9)


2.2 Extended Regular Expressions (ERE)

ERE has almost the same special characters as BRE but without the (weird) needs to escaped them by a backslash. ERE also comes with totally new features such as “alternation” (using the “|” character).

Note : that there is no back references in ERE (even if some system do support /1 ... /9 1)

  • ERE comes with the following meta-characters (use the “\” to remove the special meaning from those)
    1. Character classes (AKA POSIX bracket expressions)
    2. POSIX Character classes
    3. .
    4. ^
    5. $
    6. *
    7. ?
    8. +
    9. {}
    10. ()
    11. |



3 Summary tables

3.1 POSIX Characters class

This table lists the POSIX character class. The big advantage of POSIX character class is that they are connected to your locale, which means that you may use those classes instead of bracket expression with a range (e.g: use [[:alpha:]] instead of [a-zA-Z]) with a very good portability and security.

Note : the POSIX character class must be used within POSIX bracket expression as [[:digit:]].

POSIX character class Match
[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:blank:] Space and tab
[:cntrl:] Control characters
[:digit:] Digits
[:graph:] Visible characters
[:lower:] Lowercase letters
[:upper:] Uppercase letters
[:print:] Visible characters and the space character
[:punct:] Punctuation characters
[:space:] Whitespace characters
[:xdigit:] Hexadecimal digits


Note : To negate a POSIX character class just use the same syntax as for any character class (because it is like any character class!) which gives something like :

[^[:digit:]] # This would match any character that is NOT a digit


3.2 Which REGEX for which program

Program BRE ERE
sed (default)
sed -r (or -E on BSD)
grep (default)
egrep or grep -E


3.3 Meta-character Summary

Operator(s) Match BRE ERE
. Matches any single character.
? The preceding item is optional and will be matched, at most, once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{N} The preceding item is matched exactly N times.
{N,} The preceding item is matched N or more times.
{N,M} The preceding item is matched at least N times, but not more than M times.
represents the range (if it’s not first or last in a list or the ending point of a range in a list).
^ Matches the empty string at the beginning of a line; also represents the characters not in the range of a list.
$ Matches the empty string at the end of a line.
\b Matches the empty string at the edge of a word.
\B Matches the empty string provided it’s not at the edge of a word.
\< Matches the empty string at the beginning of word.
\> Matches the empty string at the end of word.

Note : The BRE set actually gives access to the SAME META-CHARACTERS except that you need to put a trailing “\” to active some of such as :  "+", “?", “|", “{}” and “()”.



4 Bonus : The bash regex operator

Yes it does! Of course bash offers a regular expression operator. It comes in the same form as the perl one: =~

This wonderful operator is to be used within “double square-brackets” and the searched expression is to be put within “simple quote” for bash version prior to 3.2. For newer or 3.2 bash version there is no more needs to quote the searched expression.

This operator uses the ERE set.

  • See this example (quoted from “bash scripts examples“) for a practical view of the bash regex operator
    if [[ "${i}" =~ ^[[:digit:]]+$ ]] ; then

    Ok this was for my auto-promotion, here is a cleaner example:

    if [[ "${var}" =~ ^[[:digit:]]$ ]] ; then
      echo "${var} IS a digit"
      echo "${var} is NOT a digit"

Note : When using the bash regexp operator, be careful not to quote or double-quote the regular expression, or it would NOT work.

4.1 Negate a bash regular expression based on =~ operator

Very easy to achieve, just put an exclamation mark in its right place :

if [[ ! "${var}" =~ ^[[:digit:]]$ ]] ; then
  echo "${var} is NOT a digit"
  echo "${var} IS a digit"

This is the safest way to do it.



  • The Bash Guide for Beginners on the TLDP web site – very good as usual on TLDP web site!
  • The regular-expressions.info information web site – a very complete reference for all flavors of REGEX


1 : NOT a part of POSIX; therefore this may not be recognised by your system, check man regexp for more information.

Tagged on: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site supports SyntaxHighlighter via WP SyntaxHighlighter. It can highlight your code.
How to highlight your code: Paste your code in the comment form, select it and then click the language link button below. This will wrap your code in a <pre> tag and format it when submitted.