Character Classes

Regular expressions can also include the following back quote escapes to refer to popular classes of characters:

\w any word constituent character (same as [a-zA-Z0-9_])

\W any character but a word constituent

\d a digit (same as [0-9])

\D anything but a digit

\s a white space character

\S anything but a white space character

\n a line separator character (CR or LF)

\N anything but line separator character (CR or LF)

These escapes are also allowed in character classes: '[\w+-]' means 'any character that is either a word constituent, or a plus, or a minus'.

Character classes can also include the following grep(1)-compatible elements to refer to:

[:alnum:] any alphanumeric, i.e., a word constituent, character

[:alpha:] any alphabetic character

[:cntrl:] any control character. In this version, it means any character whose code is <32.

[:digit:] any decimal digit.

[:graph:] any graphical character. In this version, this mean any character with the code >= 32.

[:lower:] any lowercase character

[:print:] any printable character. In this version, this is the same as [:cntrl:]

[:punct:] any punctuation character.

[:space:] any white space character.

[:upper:] any uppercase character.

[:xdigit:] any hexadecimal character.

Note that these elements are components of the character classes, i.e. they have to be enclosed in an extra set of square brackets to form a valid regular expression. For example, a non-empty string of digits would be represented as '[[:digit:]]+'.

Comment on this topic

Topic ID: 150145