View previous topic :: View next topic |
Author |
Message |
bobef
Joined: 05 Jun 2005 Posts: 269
|
Posted: Tue Aug 21, 2007 7:09 am Post subject: About strings/ints |
|
|
Hi,
I've been looking at MiniD these days. It seems very nice with good integration with D, but one thing keeps me wondering. What is the reason for using utf32 and not utf8 like D? So many conversations seems slow. Also, why not double/long, but int/float? This seems like an awful restriction. |
|
Back to top |
|
|
JarrettBillingsley
Joined: 20 Jun 2006 Posts: 457 Location: Pennsylvania!
|
Posted: Tue Aug 21, 2007 5:21 pm Post subject: |
|
|
Quote: | What is the reason for using utf32 and not utf8 like D? So many conversations seems slow. |
The convention in D is, and has been, to use char[] as the string type, which has mostly been perpetuated by a largely western audience and Phobos' lack of support for anything but char[]. But thanks to Tango's equally capable handling of all three UTF encodings that D supports, there's no overriding reason to use one over the other, except for space.
That being said, the only requirement for MiniD's strings is that they appear to the language as if they were an immutable sequence of UTF-32 codepoints. This was chosen mostly to avoid having to deal with ugly multibyte character issues (indexing, slicing, etc.) from within script code. The internal representation can be just about anything, as long as it provides that illusion to the script code. I've been considering using Chris Miller's dstring struct, which automatically chooses which encoding to use in order to save space.
(lastly, since this is D1 without constness, no matter what encoding is used, the string data is still duplicated to preserve immutability of string objects.)
Quote: | why not double/long, but int/float? This seems like an awful restriction. |
floats in MiniD are double, though. The spec page on types says "A float is the same as a D double: a double-precision IEEE 754 floating-point number." You can also re-alias mdfloat in minid.utils to whatever you'd like for your particular project; to float if you'd like to save a bit of space in the MDValue struct, double or real if you need lots of precision.
It uses 32-bit ints because I don't have a 64-bit machine to test long on. I know that's a poor excuse because you can test long on a 32-bit machine as well. Of course, I could probably do an "version(X86_64) alias long mdint; else alias int mdint;" much like the mdfloat alias, but all things aside, using 'long' as the integer type shouldn't cause any problems. |
|
Back to top |
|
|
bobef
Joined: 05 Jun 2005 Posts: 269
|
Posted: Wed Aug 22, 2007 12:02 am Post subject: |
|
|
Quote: | It uses 32-bit ints because I don't have a 64-bit machine to test long on |
What is there to test? Just replace int with long and will work. Since it holds more data than int it won't break anything Just adjust minid to accept longer numbers
And about the strings what troubles me is that that in utf32 each character takes 4 bytes of memory instead of 1, which obviously eats more memory and is slower. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|