FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Refactoring - Beta 1.1 Milestone

 
Post new topic   Reply to topic     Forum Index -> DDL - D Dynamic Libraries
View previous topic :: View next topic  
Author Message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Sat Feb 04, 2006 10:42 pm    Post subject: Refactoring - Beta 1.1 Milestone Reply with quote

This thread is for the final refactoring push for 1.1b. All active developers should feel free to post their comments, questions and suggestions here. Smile

I have committed the project to folding in Mango specific code as to take advantage of what it offers over the current I/O scheme in the project. The #131 and #132 changesets stand as examples for where this is headed.

http://trac.dsource.org/projects/ddl/changeset/131
http://trac.dsource.org/projects/ddl/changeset/132

I'm presently working on factoring these into my sandbox locally, so that I can get the OMF tree and utils online with these changes. Critiques are welcome from contributors, testers and everyone else. Smile

ddl.FileBuffer

A shim to help bridge the FileConduit into a more buffer like useage. It may seem like a step backwards, but it was the easiest way to move from the DDLFile design into a more Mango-like one. As it turns out IBuffer is *easier to use for arbitrary reading and seeking than the combination of IReader and IConduit - this was needed for the n-starting-byte introspection that the Loader scheme currently uses.

(*This may all change to an IReader/IConduit scheme on the next refactoring pass. Thankfully, that should only affect the core DDL classes and the Loader classes - an easy change to make)

ddl.DDLReader

Reader subclass that adds getString() methods to read in length-data encoded strings. While this breaks the whisper and shift syntax of the reader, it helps encapsulate a frequently used idiom within DDL.

ddl.DDLWriter

Same as DDLReader, only for writing support. Includes putString() methods to compliment getString().

ddl.omf.OMFReader

A set of refactored classes, formerly known as OMFCursor (and friends). Take a look at this class as an example of how to refactor existing cursors into this framework.
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Feb 05, 2006 8:10 pm    Post subject: Reply with quote

That's odd ~ io.Reader and io.Writer operate with length-prefixed arrays by default. DisplayReader and DisplayWriter disable that aspect. I'll take a look a the code to see what you're doing Smile
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Sun Feb 05, 2006 10:03 pm    Post subject: Reply with quote

kris wrote:
That's odd ~ io.Reader and io.Writer operate with length-prefixed arrays by default. DisplayReader and DisplayWriter disable that aspect. I'll take a look a the code to see what you're doing Smile


Thanks for the support Kris. As it is the current crop of refactored code barely passes the 'it compiles' step of the process. I haven't had the chance to verify the behavior one way or the other.

Honestly, the behavior for reading/writing arrays wasn't 100? clear to me even by reading the code. Going by the interfaces provided, it almost looks more like to read an array you *must* provide the elements parameter, lest the reader grab up to uint.max. Perhaps you could point me in the right direction?
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Mon Feb 06, 2006 12:54 am    Post subject: Reply with quote

Smile

The optional second arg is an override for the array length. Typically, for raw binary data, the length is provided via a prefix in the incoming stream ~ but some byte arrays, such as HTTP content, don't have a prefix. Thus, there's an option to provide it explicitly.

This is the one thing that I've lost a wee bit of sleep over with the Readers ... in some approaches the client code has to always be explicit about how much to read, with methods such as readFully(int size). I noticed that can often lead to lots of overhead in the client code regarding memory management, encoding conversions, and other issues. By default, mango Readers take care of that on behalf of the client. One has the option to manage such things directly instead, via the underlying Buffer, the Reader itself, or even the source Conduit. The upshot is that the current Reader design (intended to act as a stream to structured-data converter) takes care of a number of background chores when reading arrays. In several cases it can lead to more efficient code too: for example, mango.cluster and mango.http very rarely allocate memory for any part of an incoming data stream, and don't even copy the content from one place to another in the vast majority of cases ~ content arrives directly from the OS into the associated Buffer, and arrays are simply aliased into the client space. This is not the default configuration, but it's a readily available option. There's a number of additional ways to manage both small and rather large arrays also.

I did it this way to make the client code trivial, with an option of being very efficient. There used to be a more traditional array-reading API in an older set of Readers (perhaps 18 months ago) but I felt that was overkill and removed it. Perhaps you'd like to go over this in further detail? It may be that you'll like the approach too, but if not we can always support a readFully() type of API. I'm always open to alternatives.

- Kris
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Mon Feb 06, 2006 9:07 am    Post subject: Reply with quote

kris wrote:
Smile

The optional second arg is an override for the array length. Typically, for raw binary data, the length is provided via a prefix in the incoming stream ~ but some byte arrays, such as HTTP content, don't have a prefix. Thus, there's an option to provide it explicitly.


Gotcha. So to make sure I understand:

Code:

char[] str;
reader.get(str); // reads length followed by length char[length]
reader.get(str,111); // reads 111 chars into str


... and both methods allocate, so str can be empty or null on passing?

Out of curiosity, what happens when char[] is already allocated?

Quote:

This is the one thing that I've lost a wee bit of sleep over with the Readers ... in some approaches the client code has to always be explicit about how much to read, with methods such as readFully(int size). I noticed that can often lead to lots of overhead in the client code regarding memory management, encoding conversions, and other issues. By default, mango Readers take care of that on behalf of the client. One has the option to manage such things directly instead, via the underlying Buffer, the Reader itself, or even the source Conduit. The upshot is that the current Reader design (intended to act as a stream to structured-data converter) takes care of a number of background chores when reading arrays. In several cases it can lead to more efficient code too: for example, mango.cluster and mango.http very rarely allocate memory for any part of an incoming data stream, and don't even copy the content from one place to another in the vast majority of cases ~ content arrives directly from the OS into the associated Buffer, and arrays are simply aliased into the client space. This is not the default configuration, but it's a readily available option. There's a number of additional ways to manage both small and rather large arrays also.


Wow. I knew that the reader had gone through some changes over time, but I had no idea so much was put into it. Hopefully, those careful design decisions will reflect in DDL's performance.

Quote:

I did it this way to make the client code trivial, with an option of being very efficient. There used to be a more traditional array-reading API in an older set of Readers (perhaps 18 months ago) but I felt that was overkill and removed it. Perhaps you'd like to go over this in further detail? It may be that you'll like the approach too, but if not we can always support a readFully() type of API. I'm always open to alternatives.


Well, as long as Reader can support both modes, in some way (like in the code snippet above) I'm happy. As you can see, there's only a handful of things that the default Reader can't do, and I"m extending it where needed to make parsing less of a chore. For example, OMF has it's own wacky way of storing strings that involves 1-2 bytes of length followed by text - again, this is where the dual-mode array reading comes in handy.

Since I'm also looking into retooling the demangler - would you reccomend using Buffer w/Reader just for parsing small strings over and over again (think: symbol tables)? Or are they better suited for heavyweight parsing instead?
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Mon Feb 06, 2006 1:17 pm    Post subject: Reply with quote

pragma wrote:
Gotcha. So to make sure I understand:

Code:

char[] str;
reader.get(str); // reads length followed by length char[length]
reader.get(str,111); // reads 111 chars into str


... and both methods allocate, so str can be empty or null on passing?

The Reader treats array-references as pointer/length pairs. It adjusts them to point at wherever the content ends up. For example, if you configure a Reader with an instance of BufferAllocator, it will attempt to alias the array directly into the Buffer space (where the OS placed the content). There's some situations where it can't do that, including UTF decoding and endian translation. But typical content can be aliased without a problem (of course, if you don't want aliasing then you wouldn't reconfigure the Reader in such a manner). A good way to think about this approach is to consider the data to be a set of variable length records ~ the Reader pulls in a bunch of data from the input stream, and maps it into a structured record in memory. Some of that data may be copied into said record (such as an int, or bool) while other data may be aliased or allocated (such as arrays). There are four or five different methods of managing array memory (aliasing, heap allocation, record mapping, etc) in the ArrayAllocator module. Unfortunately, I haven't documented that aspect. Something I need to do; pronto.

BTW: this topic is responsible for a lot of the read/write symmetry:

Code:
int i;
real r;
char[] s;
float[] f;

write (i) (r) (s) (f);
read  (i) (r) (s) (f);


pragma wrote:
Out of curiosity, what happens when char[] is already allocated?

The pointer/length pair will typically get directed to somewhere else. If you pass a static char[100], the compiler will error.

pragma wrote:
Well, as long as Reader can support both modes, in some way (like in the code snippet above) I'm happy. As you can see, there's only a handful of things that the default Reader can't do, and I"m extending it where needed to make parsing less of a chore. For example, OMF has it's own wacky way of storing strings that involves 1-2 bytes of length followed by text - again, this is where the dual-mode array reading comes in handy.

Since I'm also looking into retooling the demangler - would you reccomend using Buffer w/Reader just for parsing small strings over and over again (think: symbol tables)? Or are they better suited for heavyweight parsing instead?

The Reader is ideal/best for decoding mixed-type records from a stream. If you are dealing with strings only, or with custom array lengths, then it may be more appropriate to wrap the Reader in a subclass; just as you have done. Given the 1-2 byte length prefixes, it sounds like you're doing the right thing Smile

Can you think of a way to make this whole topic simpler to deal with? The reader ended up like this after a lot of tradeoff considerations, but there may well be better alternatives.
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Mon Feb 06, 2006 2:40 pm    Post subject: Reply with quote

Quote:
(of course, if you don't want aliasing then you wouldn't reconfigure the Reader in such a manner).


Thank you. That probably saved me a good deal of trouble.

kris wrote:
Can you think of a way to make this whole topic simpler to deal with? The reader ended up like this after a lot of tradeoff considerations, but there may well be better alternatives.


Well, I think its all starting to make sense now. But looking back, I think the chief problem is that I know too much about other I/O models.

When you work with Streams in C++ and then Java for a few years each, you develop habits that, frankly, map very poorly onto Mango's world view. So I find myself making ameteurish mistakes because I'm having to re-learn stuff.

Not that Mango IO is at all deficient. Actually, its quite nice to be able to configure the lower level concerns of Reading via all these various "worker" modules. In short, traditional IO (streams and all) is a Facade approach while Mango has a hybrid Strategy/Mediator approach.

So when you get down to it, I think the design is solid but in my case the problem was educational. An FAQ, or "Migrating from Streams" doc, would probably help clear things up for other folks. Also, documenting the typical "IO Stack" (reader->buffer->conduit) as such might help those of us more used to all that stuff being in one place. Wink

FYI, things I was not expecting while learning about all this:

- IConduit is for making access to various different devices and buffers coherent but it is not a stream in and of itself: you can only seek and do crude I/O in a conduit and that *really* should be done against a buffer instead.

- IBuffer is primarily for wrapping access to a conduit, and its contained data is quite ephemeral in nature (can be purged or moved about at any time, depending on their implementation). Its not really intended as an end-point for interfacing with I/O - use a Reader or Writer instead.

- IReader/IWriter do what they say, but you cannot seek against a reader or writer directly. You must folllow the data trail from reader/writer to buffer to conduit in order to accomplish this. Even then, the operation is not guaranteed as not all conduits are seekable. But you can 'emulate' some crude seeking behavior by using the buffer, but the results depend on the state of the data in the buffer at that time - so don't do it.

- As you mentioned, the default reader behavior is length-prefixed array data with slicing from the buffer. The length-prefix stuff is great, but I expected copying to be the default since it has the fewest number of side-effects for rapid development.

- Kris really *does* put a lot of good info into the in-code documentation. Said comments will make more sense than reading the code; no, really. :p
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Mon Feb 06, 2006 3:21 pm    Post subject: Reply with quote

Yep - all good points.

BTW; the default configuration for reading arrays is not aliasing/slicing ~ everything is setup to allocate by default.

Buffer is interesting in that you can 'flow' data through them, or use it directly as a data holding tank. For instance, if I wanted to read a very large array of integers with endian conversion, I'd create a Buffer large enough to house the whole thing, and then use a Reader to populate & convert it all. Given that the Buffer is just a wrapper for an array of memory, I can construct it with an appropriate int[] in the first place ~ no further manipulation or copying required ~ the OS calls point directly at the wrapped array. If I don't know how large the input will be, I'd use a GrowBuffer to manage expansion for me.

To do a similar thing at the Conduit level, I'd read() into a provided array (without the Buffer wrapper). Of course, I'd have to do my own endian conversion in that case. At this level, the API is more like a readFully() design.

Thus, a Conduit exposes the raw stream with minimal management. Some Conduit types also support ISeekable.

The Buffer is a switchpoint for multiple different types of IO usage: (a) in-memory formatting like the Phobos OutBuffer, (b) memory-mapped files, (c) a basis for fast mixed-type, record-oriented IO, (d) an abstraction for higher levels to use during various conversion activities, (e) an abstraction for streaming Iterators to operate upon.

The Reader/Writer layer sits on top of the buffer, and therefore can be applied to file/socket/console IO, memory-based stream construction and piping, and to memory-mapped files. This layer exposes a mixed-type and somewhat record-oriented API. Kind of like a Java DataReader/DataWriter. Both the Buffer and Reader/Writer layers are optional.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> DDL - D Dynamic Libraries All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group