Note: This website is archived. For up-to-date information about D projects and development, please visit wiki.dlang.org.

RegExp Library

This library contains functions for matching and manipulating strings based on patterns called regular expressions. This is a complex topic in and of itself, so I won't attempt to explain how they work or how to write them. Since this library is based on the Digital Mars RegExp library, it's best if you look at the RegExp documentation page for info on the particular syntax of the regular expressions that this library uses.

A hint for writing regexps in MiniD: since the regexp library uses the backslash character to escape characters, and so does MiniD, you can avoid having to write double backslashes in your regexp strings by instead using the WYSIWYG string literals. Either enclose your regexp strings in backticks, like `^\d{5}$`, or by prefixing the string literal with the '@' symbol, as in @"^\d{5}$".

Convenience Regexps

These are some regexp strings which are defined for convenience.

regexp.email
Matches email addresses.
regexp.url
Matches HTTP URLs.
regexp.alpha
Matches lower- and upper-case letters and the underscore.
regexp.space
Matches all whitespace characters.
regexp.digit
Matches all decimal digits.
regexp.hexdigit
Matches all hexadecimal digits.
regexp.octdigit
Matches all octal digits.
regexp.symbol
Matches the following symbols: ( ) [ ] . , ; = < > + - * / & ^
regexp.chinese
Matches all valid Chinese characters.
regexp.cnPhone
Matches valid Chinese phone numbers.
regexp.cnMobile
Matches valid Chinese mobile phone numbers.
regexp.cnZip
Matches valid Chinese ZIP codes.
regexp.cnIDcard
Matches valid Chinese ID card numbers.
regexp.usPhone
Matches valid USA phone numbers.
regexp.usZip
Matches valid USA ZIP codes.

Functions

For all these functions, the "attributes" parameters are all optional and are all strings consisting of the "attributes" characters specified on the Digital Mars RegExp library documentation page. There you will also find information on formatting strings, for functions like replace().

regexp.test(pattern, string [, attributes])

Tests if string matches the pattern regular expression. Returns true if it does, and false otherwise.

regexp.replace(pattern, source, replacement [, attributes])

If replacement is a string, replaces any matches to pattern in source with the replacement format string, returning the new string. If replacement is a function, it will call that function with an instance of the RegExp class for each match, and will expect that function to return a replacement string. This allows for very flexible replacements. This is a small example:

global x = 5;
global y = 10;
local s = "x = $x, y = $y";

writefln(regexp.replace(`\$(\w*)`, s, function(m) toString(eval(m.match(0)[1 ..])), "g"));
// --> x = 5, y = 10

Warning: the object passed to the callback function should not be saved somewhere in any way. It is completely temporary and will become invalid after regexp.replace returns.

regexp.split(pattern, source [, attributes])

Splits the source string using pattern as the delimiter. Returns an array of strings which represent the split-up components of source.

regexp.match(pattern, source [, attributes])

Looks for matches to pattern in source, and returns an array of all the strings representing the matches (and that can be an empty array).

regexp.compile(pattern [, attributes])

Compiles a regular expression string into an object. Returns the object, which is an instance of class RegExp (class RegExp can't be instantiated directly).

RegExp

This is a class whose instances represent compiled regular expressions. This class can't (and shouldn't) be instantiated directly; use regexp.compile() to obtain an instance of the RegExp class.

These are its methods.

test(string)
Tests if string matches this pattern. Returns true if it matches, and false otherwise.
match(int | string)
This method has two forms. If it is given 0, returns the current match (for use during iteration). If it is given a positive integer, returns the nth parenthesized subexpression in the pattern. If it is given a string, it looks for matches to this pattern in string, and returns an array of all the strings representing the matches.
find(string)
Looks for this pattern in string. Returns the 0-based position if found, and -1 otherwise.
split(string)
Splits string using this pattern as the delimiter. Returns an array of the split-up string's components.
replace(string, replacement)
Replaces and matches to this pattern in string with the replacement format string. Returns the new string.
search(string)
This sets up the object to start searching through the given string. Immediately after calling this function, you should run a foreach loop on the object. This is explained in opApply.
opApply()
Once the object has been set up to search a string with search, use this to iterate over the matches with a foreach loop. This returns two indices for each iteration: the index of the current match (from 0), and the object itself. From the object index, you can obtain the value of the current match by calling .match(0) on it, as well as using the .pre() and .post() methods. Here's an example of use:
foreach(i, m; regexp.compile("ab").search("abcabcabab"))
	writefln(i, ": %s[%s]%s", m.pre(), m.match(0), m.post());
pre()
During iteration, returns the portion of the source string that comes before the current match.
post()
During iteration, returns the portion of the source string that comes after the current match.