License:
BSD style: see license.txtVersion:
Initial release: Jan 2008Authors:
Jascha Wetzel This is a regular expression compiler and interpreter based on the Tagged NFA/DFA method. See Wikpedia's article on regular expressions for details on regular expressions in general. The used method implies, that the expressions are regular, in the way language theory defines it, as opposed to what "regular expression" means in most implementations (e.g. PCRE or those from the standard libraries of Perl, Java or Python). The advantage of this method is it's performance, it's disadvantage is the inability to realize some features that Perl-like regular expressions have (e.g. back-references). See "Regular Expression Matching Can Be Simple And Fast" for details on the differences. The time for matching a regular expression against an input string of length N is in O(M*N), where M depends on the number of matching brackets and the complexity of the expression. That is, M is constant wrt. the input and therefore matching is a linear-time process. The syntax of a regular expressions is as follows. X and Y stand for an arbitrary regular expression.X|Y | alternation, i.e. X or Y |
(X) | matching brackets - creates a sub-match |
(?X) | non-matching brackets - only groups X, no sub-match is created |
[Z] | character class specification, Z is a string of characters or character ranges, e.g. [a-zA-Z0-9_.\-] |
[^Z] | negated character class specification |
<X | lookbehind, X may be a single character or a character class |
>X | lookahead, X may be a single character or a character class |
^ | start of input or start of line |
$ | end of input or end of line |
\b | start or end of word, equals (?<\s>\S|<\S>\s) |
\B | opposite of \b, equals (?<\S>\S|<\s>\s) |
X? | zero or one |
X* | zero or more |
X+ | one or more |
X{n,m} | at least n, at most m instances of X. If n is missing, it's set to 0. If m is missing, it is set to infinity. |
X?? | non-greedy version of the above operators |
X*? | see above |
X+? | see above |
X{n,m}? | see above |
. | any printable character |
\s | whitespace |
\S | non-whitespace |
\w | alpha-numeric characters or underscore |
\W | opposite of \w |
\d | digits |
\D | non-digit |
Params:
pattern | Regular expression. |
Throws:
RegExpException if there are any compilation errors.Example:
Declare two variables and assign to them a Regex object:
1 2 | auto r = new Regex("pattern"); auto s = new Regex(r"p[1-5]\s*"); |
Params:
pattern | Regular expression. |
Throws:
RegExpException if there are any compilation errors.Example:
Declare two variables and assign to them a Regex object:
1 2 | auto r = Regex("pattern"); auto s = Regex(r"p[1-5]\s*"); |
Returns:
Instance of RegExpT set up to search input.Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 | import tango.io.Stdout; import tango.text.Regex; void main() { foreach(m; Regex("ab").search("qwerabcabcababqwer")) Stdout.formatln("{}[{}]{}", m.pre, m.match(0), m.post); } // Prints: // qwer[ab]cabcababqwer // qwerabc[ab]cababqwer // qwerabcabc[ab]abqwer // qwerabcabcab[ab]qwer |
Returns:
false for no match, true for matchReturns:
false for no match, true for matchParams:
index | index index = 0 returns whole match, index > 0 returns submatch of bracket #index |
Returns:
Slice of input for the requested submatch, or null if no such submatch exists.Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 | import tango.io.Stdout; import tango.text.Regex; void main() { auto strs = Regex("ab").split("abcabcababqwer"); foreach( s; strs ) Stdout.formatln("{}", s); } // Prints: // c // c // qwer |