tango.text.Search

License:

BSD style: see license.txt

Version:

May 2009: Initial release

since:

0.99.9

Author:

Kris
FindFruct!(T) find(T)(T[] what) #
Returns a lightweight pattern matcher, good for short patterns and/or short to medium length content. Brute-force approach with fast multi-byte comparisons
SearchFruct!(T) search(T)(T[] what) #
Returns a welterweight pattern matcher, good for long patterns and/or extensive content. Based on the QS algorithm which is a Boyer-Moore variant. Does not allocate memory for the alphabet.
Generally becomes faster as the match-length grows
struct FindFruct(T) #
Convenient bundle of lightweight find utilities, without the hassle of IFTI problems. Create one of these using the find()

function:

1
2
3
4
5
6
7
8
9
auto match = find ("foo");
auto content = "wumpus foo bar"

// search in the forward direction
auto index = match.forward (content);
assert (index is 7);

// search again - returns length when no match found
assert (match.forward(content, index+1) is content.length);

Searching operates both forward and backward, with an optional start offset (can be more convenient than slicing the content). There are methods to replace matches within given content, and others which return foreach() iterators for traversing content.

SearchFruct is a more sophisticated variant, which operates more efficiently on longer matches and/or more extensive content.

size_t forward(T[] content, size_t ofs = 0) #
Search forward in the given content, starting at the optional index.
Returns the index of a match, or content.length where no match was located.
size_t reverse(T[] content, size_t ofs = size_t.max) #
Search backward in the given content, starting at the optional index.
Returns the index of a match, or content.length where no match was located.
T[] match() #
Return the match text
void match(T[] what) #
Reset the text to match
bool within(T[] content) #
Returns true if there is a match within the given content
size_t count(T[] content) #
Returns number of matches within the given content
T[] replace(T[] content, T chr) #
Replace all matches with the given character. Use method tokens() instead to avoid heap activity.
Returns a copy of the content with replacements made
T[] replace(T[] content, T[] sub = null) #
Replace all matches with the given substitution. Use method tokens() instead to avoid heap activity.
Returns a copy of the content with replacements made
Util.PatternFruct!(T) tokens(T[] content, T[] sub = null) #
Returns a foreach() iterator which exposes text segments between all matches within the given content. Substitution text is also injected in place of each match, and null can be used to indicate removal instead:
1
2
3
4
5
6
char[] result;

auto match = find ("foo");
foreach (token; match.tokens ("$foo&&foo*", "bar"))
         result ~= token;
assert (result == "$bar&&bar*");
This mechanism avoids internal heap activity.
Indices indices(T[] content) #
Returns a foreach() iterator which exposes the indices of all matches within the given content:
1
2
3
4
5
6
int count;

auto f = find ("foo");
foreach (index; f.indices("$foo&&foo*"))
         ++count;
assert (count is 2);
struct Indices [private] #
Simple foreach() iterator
struct SearchFruct(T) #
Convenient bundle of welterweight search utilities, without the hassle of IFTI problems. Create one of these using the search()

function:

1
2
3
4
5
6
7
8
9
auto match = search ("foo");
auto content = "wumpus foo bar"

// search in the forward direction
auto index = match.forward (content);
assert (index is 7);

// search again - returns length when no match found
assert (match.forward(content, index+1) is content.length);

Searching operates both forward and backward, with an optional start offset (can be more convenient than slicing the content). There are methods to replace matches within given content, and others which return foreach() iterators for traversing content.

FindFruct is a simpler variant, which can operate efficiently on short matches and/or short content (employs brute-force strategy)

SearchFruct opCall(T[] what) [static] #
Construct the fruct
T[] match() #
Return the match text
void match(T[] what) #
Reset the text to match
size_t forward(T[] content, size_t ofs = 0) #
Search forward in the given content, starting at the optional index.
Returns the index of a match, or content.length where no match was located.
size_t reverse(T[] content, size_t ofs = size_t.max) #
Search backward in the given content, starting at the optional index.
Returns the index of a match, or content.length where no match was located.
bool within(T[] content) #
Returns true if there is a match within the given content
size_t count(T[] content) #
Returns number of matches within the given content
T[] replace(T[] content, T chr) #
Replace all matches with the given character. Use method tokens() instead to avoid heap activity.
Returns a copy of the content with replacements made
T[] replace(T[] content, T[] sub = null) #
Replace all matches with the given substitution. Use method tokens() instead to avoid heap activity.
Returns a copy of the content with replacements made
Substitute tokens(T[] content, T[] sub = null) #
Returns a foreach() iterator which exposes text segments between all matches within the given content. Substitution text is also injected in place of each match, and null can be used to indicate removal instead:
1
2
3
4
5
6
char[] result;

auto match = search ("foo");
foreach (token; match.tokens("$foo&&foo*", "bar"))
         result ~= token;
assert (result == "$bar&&bar*");
This mechanism avoids internal heap activity
Indices indices(T[] content) #
Returns a foreach() iterator which exposes the indices of all matches within the given content:
1
2
3
4
5
6
int count;

auto match = search ("foo");
foreach (index; match.indices("$foo&&foo*"))
         ++count;
assert (count is 2);
size_t find(char* what, size_t wlen, char* content, size_t len, size_t ofs) [private] #
size_t rfind(char* what, size_t wlen, char* content, size_t len, size_t ofs) [private] #
bool matches(char* a, char* b, size_t length) [private, static] #
void reset() [private] #
Construct lookup table. We force the alphabet to be char[] always, and consider wider characters to be longer patterns instead
void flip() [private] #
Reverse lookup-table direction
struct Indices [private] #
Simple foreach() iterator
struct Substitute [private] #
Substitution foreach() iterator