A character set is a string of characters enclosed in square brackets. It matches any single character if it appears between the brackets. For example, '[01]' matches either '0' or '1':

'0' matches: '[01]' -- true

'3' matches: '[01]' -- false

'11' matches: '[01]' -- false: a set matches only one character

Using plus operator, we can build the following binary number recognisor:

'10010100' matches: '[01]+' -- true

'10001210' matches: '[01]+' -- false

If the first character after the opening bracket is '^', the set is inverted: it matches any single character *not* appearing between the brackets:

'0' matches: '[^01]' -- false

'3' matches: '[^01]' -- true

For convenience, a set may include ranges: pairs of characters separated with '-'. This is equivalent to listing all characters between them: '[0-9]' is the same as '[0123456789]'.

Special characters within a set are '^', '-', and ']' that closes the set. Below are the examples of how to literally use them in a set:

[01^] put the caret anywhere except the beginning

[01-] put the dash as the last character

[]01] put the closing bracket as the first character

[^]01] (thus, empty and universal sets cannot be specified)

Comment on this topic

Topic ID: 150146