View previous topic :: View next topic |
Author |
Message |
Don Clugston
Joined: 05 Oct 2005 Posts: 91 Location: Germany (expat Australian)
|
Posted: Mon Jul 03, 2006 1:30 am Post subject: Compile-time regexp: getting it right |
|
|
I've been giving some thought to compile-time regular expressions.
There are three benefits to compile-time regexps, over the run-time approach used in (say) std.regexp:
(a) catch errors at compile time instead of throwing an exception at run-time;
(b) eliminate the cost of compiling the regexp in the first call;
(c) performance improvement by generating more customized execution code.
The approaches we've tried to date, involving recursive templates expansion or mixins, have tried to accomplish all of these simultaneously. This seems to be too ambitious. In particular, I think that in the general case, with the current limitations of mixins, we have no chance of out-performing the "state-machine-string with giant switch statement" approach.
Instead, I think it's better to split the problem up. The compile-time regexp should only compile the regexp string into a state-machine string, detecting errors in the process. Then, it can classify the string, to choose which run-time should be used. There is probably a small but important subset of regexps for which it's worth trying to generate optimal code.
With this approach to compile-time regexps, the development approach should be:
* Get it working for run-time regexps first. Initially, the compile-time regexps would just pass the string to the run-time parser, optimiser and engine. Make sure we have a good run-time regexp, but using the compile-time syntax.
* Duplicate the run-time compilation into the compile-time parser. (Note: even the optimisation step could be left at run-time; initially, we only need to get benefit (a)).
* Only move things to compile-time when they have a demonstrable speed benefit.
I also think that the Perl6 regexp syntax is well worth considering. I think Larry makes some excellent points about how ugly traditional regexp syntax is (and it seems similar to the philosophy of D vs C++); we should consider grabbing some ideas from there, especially, the stuff about capturing vs non-capturing brackets.
http://dev.perl.org/perl6/doc/design/apo/A05.html
Comments? |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Mon Jul 03, 2006 3:15 pm Post subject: |
|
|
Note that (b) can be offset when using std.Regex by stashing pre-constructed Regex instances. In other words, you don't have to compile the pattern at the point of usage. |
|
Back to top |
|
|
Don Clugston
Joined: 05 Oct 2005 Posts: 91 Location: Germany (expat Australian)
|
Posted: Tue Jul 04, 2006 12:28 am Post subject: |
|
|
Quote: | Note that (b) can be offset when using std.Regex by stashing pre-constructed Regex instances. In other words, you don't have to compile the pattern at the point of usage. |
That's true. And if it turned out that searching for the regexp in a list of previously compiled ones was an expensive operation, a compile-time hash function could be used (and that's trivial to write).
This could be a lot less work than I thought.[/quote] |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|