View previous topic :: View next topic |
Author |
Message |
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Tue Jan 10, 2006 9:27 pm Post subject: Forthcoming changes |
|
|
There's been some concensus that templates should probably be named with a suffix of "T". This is contrary to the convention currently adopted by Mango, so I will be changing all templates accordingly.
There should be minimal impact, since the existing alias's should take care of the majority of cases. |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Wed Jan 11, 2006 4:44 pm Post subject: |
|
|
There's also been some activity around io.Token & io.Tokenizer.
These remain intact for now, but users should consider migrating to the Iterator approach exposed in mango.text ~ the old io.Tokenizer and friends will be deprecated in due course, since they cannot handle unicode correctly. |
|
Back to top |
|
|
csauls
Joined: 27 Mar 2004 Posts: 278
|
Posted: Thu Jan 12, 2006 9:59 pm Post subject: |
|
|
Personally, I tend to be fond of a T prefix (TFoo) for most templates, and an M prefix (MFoo) for mixins... But I get along okay with T suffixes. I'm going to have to take a look at how Iterator works, as I can't help thinking Mango should have a D lexer. I was going to try doing one around Token/Tokenizer, but if this Iterator creature is better at Unicode, then by golly. _________________ Chris Nicholson-Sauls |
|
Back to top |
|
|
pragma
Joined: 28 May 2004 Posts: 607 Location: Washington, DC
|
Posted: Wed Jan 18, 2006 6:53 pm Post subject: |
|
|
I'm migrating my way of thinking from the Mango Tokenizers over to the new Iterators, and I think I'm seeing a slight disconnect in the approach. Overall, it looks like a huge improvement, and a further congealing of the API.
In short: I'm redoing the demangler to use Mango. Out of curiosity what do you reccomend for folks to do in order to implement a reasonably complex grammar using TextReader and Iterato (or should I use them at all)r? I'm gathering that it takes subclassing Iterator and keeping both the custom iterator and reader on hand to do the job?
I see that the TextReader parses each token returned from the iterator per call to get(); is that correct? _________________ -- !Eric.t.Anderton at gmail |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Wed Jan 18, 2006 7:21 pm Post subject: |
|
|
pragma wrote: | I'm migrating my way of thinking from the Mango Tokenizers over to the new Iterators, and I think I'm seeing a slight disconnect in the approach. Overall, it looks like a huge improvement, and a further congealing of the API.
In short: I'm redoing the demangler to use Mango. Out of curiosity what do you reccomend for folks to do in order to implement a reasonably complex grammar using TextReader and Iterato (or should I use them at all)r? I'm gathering that it takes subclassing Iterator and keeping both the custom iterator and reader on hand to do the job?
I see that the TextReader parses each token returned from the iterator per call to get(); is that correct? |
Readers and writers are for converting mixed model content on a piecemeal basis. They work great if you need to convert a mixed memory model to a streaming model, and vice versa (say, an int, char, real[], wchar[] into a file, and back out again).
If you already know your content is all in one format (text, for instance) then the reader/writer model is almost entirely superfluous. The one value it retains in that scenario is the configurable transcoding between various utf representations ~ text read/write may get converted on-the-fly, depending upon the request-type and the source type. But, efficient transcoding is simple to do with mango.convert.Unicode anyway; thus that benefit is largely negated (for a pure text stream).
The Iterators sidestep all that, since they already know (a) they're dealing with text-only and (b) the text is already decoded into one of char[], wchar[] or dchar[]. Most Iterators are reasonably simple in what they do: parse a line from the content; parse a comma seperated list; parse a cookie from an http-header. Their value comes from (a) simplicity of use, and (b) their ability to parse streams. The latter part is important for general usage.
Getting back to your grammar example: one key question to ask is "will I have all the text in memory at one time, or will it have to be streamed?". If it were me, I'd read the entire text into an array and parse it directly. Perhaps using Unicode or UnicodeFile to convert it for me (if necessary). That way, you'd avoid the streaming overhead, and get to build the lexer in exactly the way that's most comfortable to you.
On the other hand: if you do need to stream the input then creating an Iterator subclass, and implementing method scan(), is a good approach. You'd likely encapsulate the lexer state within the class itself, since the generic next() interface is a bit thin for heavyweight parsers and lexers
In summary, it's worth noting that Iterators are generic for a reason. You can certainly make them do what you want to, but it may well pay to step beyond that for something with a narrower focus.
Hope that helps?
(edit: it's really poor form to mispell grammar)
Last edited by kris on Thu Jan 19, 2006 9:22 am; edited 1 time in total |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Wed Jan 18, 2006 7:33 pm Post subject: |
|
|
pragma wrote: | I'm gathering that it takes subclassing Iterator and keeping both the custom iterator and reader on hand to do the job? |
No ... you just need an Iterator and one of the following types to provide the content:
* an array of char/wchar/dchar
* a Conduit, such as a file or socket
* a Buffer instance, which can be bound to any of the above
One useful Iterator property is inherited from the use of Buffer ~ you can bind multiple (different kinds of) Iterators to the same Buffer, and they will remain in lock-step with each other. This holds true for any Reader bound the same Buffer also (and operates the same with multiple Writer instances, for output). That can come in handy at times.
Other small things include a succinct means to feed the content from one Iterator into another. For example, one might use a LineIterator to feed, say, a comma Iterator (or something like that):
Code: | auto line = new LineIterator (new FileConduit("myfile"));
auto comma = new SimpleIterator (",");
while (line.next)
{
comma.set (line.get);
while (comma.next)
...
} |
Nothing surprising there, but it can be convenient. Edit: I should note that SVN head is cleaner in this regard than the zip files.
Last edited by kris on Thu Jan 19, 2006 8:18 am; edited 2 times in total |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Wed Jan 18, 2006 8:12 pm Post subject: |
|
|
Wow... the new iterator method is so much easier to understand. I'm afraid I never found it easy to grasp the tokenizer idea in Mango. I don't know why; it just didn't look very clear in the examples.
-JJR |
|
Back to top |
|
|
pragma
Joined: 28 May 2004 Posts: 607 Location: Washington, DC
|
Posted: Thu Jan 19, 2006 4:14 am Post subject: |
|
|
kris wrote: | [...]Hope that helps? |
Thank you Kris. Its all much more clear to me now. _________________ -- !Eric.t.Anderton at gmail |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Thu Jan 19, 2006 10:14 am Post subject: |
|
|
JJR wrote: | Wow... the new iterator method is so much easier to understand. I'm afraid I never found it easy to grasp the tokenizer idea in Mango. I don't know why; it just didn't look very clear in the examples. |
Glad to hear that. You should have said something earlier |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Thu Jan 19, 2006 3:24 pm Post subject: |
|
|
kris wrote: | JJR wrote: | Wow... the new iterator method is so much easier to understand. I'm afraid I never found it easy to grasp the tokenizer idea in Mango. I don't know why; it just didn't look very clear in the examples. |
Glad to hear that. You should have said something earlier |
Well... to be honest, I thought the problem was with my brain . So I wasn't about to challenge the idea. But the iterator pattern just clicked a whole lot easier with the way my mind works.
Strange how that is. It doesn't make the previous method wrong, of course.
Thanks again.
-JJR |
|
Back to top |
|
|
|