Extensions

The above primitive expressions and operators are common to many implementations of regular expressions. The next primitive expression is unique to this implementation.

A sequence of characters between colons is treated as a unary selector which is supposed to be understood by Characters. A character matches such an expression if it answers -- true to a message with that selector. This allows a more readable and efficient way of specifying character classes. For example, '[0-9]' is equivalent to ':isDigit:', but the latter is more efficient. Analogous to character sets, character classes can be negated: ':^isDigit:' matches a Character that answers -- false to #isDigit, and is therefore equivalent to '[^0-9]'.

As an example, so far we have seen the following equivalent ways to write a regular expression that matches a non-empty string of digits:

'[0-9]+'

'\d+'

'[\d]+'

'[[:digit:]+'

:isDigit:+'

The last group of special primitive expressions includes:

. matching any character except a newline;

^ matching an empty string at the beginning of a line;

$ matching an empty string at the end of a line.

\b an empty string at a word boundary

\B an empty string not at a word boundary

\< an empty string at the beginning of a word

\> an empty string at the end of a word

'axyzb' matches: 'a.+b' -- true

'ax zb' matches: 'a.+b' -- false (space is not matched by '.')

Again, all the above three characters are special and should be quoted to be matched literally.

Comment on this topic

Topic ID: 150150