Note: This website is archived. For up-to-date information about D projects and development, please visit wiki.dlang.org.

Regexp Library

This library contains functions for matching and manipulating strings based on patterns called regular expressions. This is a complex topic in and of itself, so I won't attempt to explain how they work or how to write them. Since this library is based on the Tango Regex library, it's best if you look at the Regex documentation page for info on the particular syntax of the regular expressions that this library uses.

A hint for writing regexps in MiniD: since the regexp library uses the backslash character to escape characters, and so does MiniD, you can avoid having to write double backslashes in your regexp strings by instead using the WYSIWYG string literals. Either enclose your regexp strings in backticks, like `^\d{5}$`, or by prefixing the string literal with the '@' symbol, as in @"^\d{5}$".

Warning: the underlying Tango regexp library has some.. issues right now. For one, compiling regexps is painfully slow and can actually crash the program with longer regexps (try compiling regexp.url for example). Also, the current Tango regexp library does not support case-insensitive regexps, and in general, they are not as powerful as you might be used to (i.e. no backreferences). If you need more powerful regex support, consider the PCRE addon library.

Members

  1. Convenience Regexps
    1. regexp.email
    2. regexp.url
    3. regexp.alpha
    4. regexp.space
    5. regexp.digit
    6. regexp.hexdigit
    7. regexp.octdigit
    8. regexp.symbol
    9. regexp.chinese
    10. regexp.cnPhone
    11. regexp.cnMobile
    12. regexp.cnZip
    13. regexp.cnIDcard
    14. regexp.usPhone
    15. regexp.usZip
  2. class Regexp
    1. this(pattern: string, attrs: string = "")
    2. test(string: string)
    3. match(m: int|string = 0)
    4. find(string: string)
    5. split(string: string)
    6. replace(string: string, replacement: string|function)
    7. search(string: string)
    8. opApply()
    9. pre()
    10. post()

Convenience Regexps

These are some regexp strings which are defined for convenience.

regexp.email

Matches email addresses.

regexp.url

Matches HTTP URLs.

regexp.alpha

Matches lower- and upper-case letters and the underscore.

regexp.space

Matches all whitespace characters.

regexp.digit

Matches all decimal digits.

regexp.hexdigit

Matches all hexadecimal digits.

regexp.octdigit

Matches all octal digits.

regexp.symbol

Matches the following symbols: ( ) [ ] . , ; = < > + - * / & ^

regexp.chinese

Matches all valid Chinese characters.

regexp.cnPhone

Matches valid Chinese phone numbers.

regexp.cnMobile

Matches valid Chinese mobile phone numbers.

regexp.cnZip

Matches valid Chinese ZIP codes.

regexp.cnIDcard

Matches valid Chinese ID card numbers.

regexp.usPhone

Matches valid USA phone numbers.

regexp.usZip

Matches valid USA ZIP codes.

class Regexp

This is a class whose instances represent compiled regular expressions.

this(pattern: string, attrs: string = "")

Compiles a regular expression string into an object. The pattern parameter is the regular expression to compile. The attrs parameter is a string containing extra attributes for the regexp. Currently the only implemented attribute is "g", which means for the regexp to perform globally, that is, it will find all matches of a pattern in a string instead of just the first one.

test(string: string)

Tests if string matches this pattern. Returns true if it matches, and false otherwise. If it returns true, the matches will be filled in, which can then be retrieved with the match method. For example:

local re = regexp.Regexp("(.+)=(.+)")

if(re.test("foo=bar"))
	writefln("name = {}, value = {}", re.match(1), re.match(2)) // prints "name = foo, value = bar"

match(m: int|string = 0)

This method has two forms. If it is given 0, returns the current match (for use during iteration). This is the default value, so you can call it without parameters. If it is given a positive integer, returns the nth parenthesized subexpression in the pattern. If it is given a string, it looks for matches to this pattern in string, and returns an array of all the strings representing the matches.

find(string: string)

Looks for this pattern in string. Returns the 0-based position if found, and the length of the string otherwise.

split(string: string)

Splits string using this pattern as the delimiter. Returns an array of the split-up string's components.

replace(string: string, replacement: string|function)

If replacement is a string, replaces any matches in string with the replacement format string, returning the new string. If replacement is a function, it will call that function with an instance of the Regexp class for each match, and will expect that function to return a replacement string. This allows for very flexible replacements. This is a small example:

global x = 5
global y = 10
local s = "x = $x, y = $y"
local re = regexp.Regexp(`\$(\w+)`, "g")

writeln$ re.replace$ s, \m -> toString(eval(m.match(1)))
// Outputs "x = 5, y = 10"

The Regexp instance passed to the function is just the regexp instance on which replace was called.

Returns the completed string.

search(string: string)

This sets up the regex to start searching through the given string. Immediately after calling this function, you should run a foreach loop on the object. This is explained in opApply.

opApply()

Once the regex has been set up to search a string with search, use this to iterate over the matches with a foreach loop. This returns two indices for each iteration: the index of the current match (from 0), and the instance itself. From the instance, you can obtain the value of the current match by calling .match(0) (or just .match()) on it, as well as using the .pre() and .post() methods. Here's an example of use:

foreach(i, m; regexp.Regexp("ab").search("abcabcabab"))
	writefln("{}: {}[{}]{}", i, m.pre(), m.match(), m.post());

pre()

During iteration, returns the portion of the source string that comes before the current match.

post()

During iteration, returns the portion of the source string that comes after the current match