Textual Encoding

pragma · Joined: 28 May 2004 Posts: 607 Location: Washington, DC

(following Kris' invitation)

I've mulled it over. I like Kris' class hierarchy for the String types, but I'm not 100? sold on the necessity for so many levels in the tree. What is the use case for UtfString or Slice... or is this just being prepared for the unforseen?

As for the Universal type, I think it would be happiest if it could play fair with such a tree. The problem is that you would have to adapt it to each class in the tree, or be satisfied with just a single UniversalString class that extends MutableString.

To work around that, you'd have to go to a common abstract base or interface and implement both standard and universal types from that. Templating also becomes a bit more transparent as a consequence, but you do take on some bloat on the base-class (format routines for each char type), and you loose Kris' elegant hierarchy -- most everything would code against that one single base string interface.

pragma · Joined: 28 May 2004 Posts: 607 Location: Washington, DC

teqdruid · Joined: 11 May 2004 Posts: 390 Location: UMD

teqdruid · Joined: 11 May 2004 Posts: 390 Location: UMD

sean · Joined: 24 Jun 2004 Posts: 609 Location: Bay Area, CA

kris · Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific

kris · Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific

sean · Joined: 24 Jun 2004 Posts: 609 Location: Bay Area, CA

sean · Joined: 24 Jun 2004 Posts: 609 Location: Bay Area, CA

teqdruid · Joined: 11 May 2004 Posts: 390 Location: UMD

So I've just committed a new copy of the UniversalString.d. This one has MutableString, and it seems to be working pretty well. There's also some new stuff in the test file:
http://trac.dsource.org/projects/mango/browser/trunk/mango/test/universalstring.d?rev=680

You'll also notice that all of the methods in Kris' templated classes are available for the specific types: that is, if you want to use Utf8MutableString, then you can do everything that Kris' MutableString!(char) can do, and it'll implicitly convert to a generic String, which can do a helluva lot- see the link above. The only hierarchy problem with it is that MutableUtf8String doesn't inherit from Utf8String. This is due to D's lack of multiple inheritance. The solution to the problem is to make String and MutableString interfaces, but then MutableString can't be implicitly casted to String, since interfaces are not covariant with each other (because DMD's interface support sucks) and that limitation makes interfaces not an option.

I don't get how one can look at the test file and *not* see how useful the universal string stuff is. At the very least (aside from the slight inheritance problem noted above) I don't see any way that it would be a bad thing, since you can still work with specific encodings if you want.

Oh yeah, and the hash method doesn't work right, since two Strings of different encodings (but the same string) will .opEquals) to the same, but won't necessarily have the same hash. Other than always converting to a specific encoding, then running the hash, I'm not certain how to solve this one.

Thoughts?
~John

teqdruid · Joined: 11 May 2004 Posts: 390 Location: UMD

kris · Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific

pragma · Joined: 28 May 2004 Posts: 607 Location: Washington, DC

I think John's argument is a good one. I'm leaning a bit closer to sticking with keeping universals to a special-use niche. But I'm still thinking about it. Confused

Here's my thoughts so far:

I think that a universal type does the job in a pinch, and is a great tool when you are between two technologies, which you cannot change, that dont' use the same encoding. This is a *very* likely scenario should D gain in popularity, so there's no reason why you wouldn't have this in your toolbox. Its like working on your car only to find that bolt you've been merrily rounding off is metric, and that 1/2" socket doesn't quite cut it; so you instinctively reach for the adjustable wrench instead (and pray to god the bolt isn't torqued). Wink

(Or like the plumbing in my parent's house: its old enough that the threads are all on the "wrong end" of the works, so you can't use any of the typical stuff at Home Depot. You have to get special parts and adapters to hook up that new garbage disposal, but you still want to keep your sink...)

In the case of Mango, XML and other related technologies, we are in a position to *dictate* a "Lingua Franca" of sorts and say "use this encoding for optimal performance" and leave it to the library user to transcode at the I/O boundary (or just use universal string where appropriate). I think Mango's filters are especially well suited to this the task of transcoding for situations like this, especially when you can't get the whole stream at once as with over a socket.

So to sum up: UniversalString is an awesome general purpose tool, that solves all kinds of edge cases and odd problems, but the jury is still out as to wether or not its the right fit for the XML family of libs. John makes a very compelling argument, and I can see where he's coming from.
_________________
-- !Eric.t.Anderton at gmail