FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Future 'Symbol' type?

 
Post new topic   Reply to topic     Forum Index -> MiniD
View previous topic :: View next topic  
Author Message
csauls



Joined: 27 Mar 2004
Posts: 278

PostPosted: Wed Aug 15, 2007 3:38 pm    Post subject: Future 'Symbol' type? Reply with quote

I've recently turned a friend of mine on to D + Tango + MiniD which he's planning on using in a project of his own (feature-extensible tailing log parser).

I told him to let me know of anything he needs/wants from MiniD and I'd help him develop them, and one thing came up which might be worth considering for inclusion in MiniD/2.0 -- Symbols, a la Ruby.

In terms of MiniD a Symbol could probably be defined as "a string literal which always produces the same instance" and should probably also have a numeric id associated with it as in Ruby. The syntax from Ruby (:sym and :"with space") doesn't work well in D-like syntax, so probably something like the dollar sign ($sym and $"with space") would be better (plus, the S in $ "stands for" Symbol Wink).

In the meantime, he's just using a custom class.
_________________
Chris Nicholson-Sauls
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
JarrettBillingsley



Joined: 20 Jun 2006
Posts: 457
Location: Pennsylvania!

PostPosted: Thu Aug 16, 2007 4:45 pm    Post subject: Reply with quote

STOP READING MY IDEAS FILE. Seriously. Do you know where I live, or what?

I like the :foo syntax. It could break code in a few places, though, like "class B:A", but just use a better spacing style dammit. I've also been thinking of using $ as a 'toString' operator.

If I did this, I'd probably separate field access (a.b) from indexing (a[:b]). Though perhaps tables could be a special case, where field access and indexing will just do the same thing.

Another implication of this feature is that strings, no longer being the type used for field access, could (strong emphasis on the 'could') become mutable. I was thinking of maybe having two kinds of strings, mutable and immutable, and mutable couldn't be table keys, but then it's like.. is it necessary to have all those types?

Mostly what I wonder is if this will be any faster than the current method. Currently multiple strings can exist with the same data, but their hash is computed once, when they're created. So field accesses usually just need to look at the cached hash; sometimes a string comparison has to be done but it's not that often, and usually field names are short. I think symbols could improve speed, but I wonder by how much.
Back to top
View user's profile Send private message
csauls



Joined: 27 Mar 2004
Posts: 278

PostPosted: Thu Aug 16, 2007 11:57 pm    Post subject: Reply with quote

There should be some speedup, particularly in comparing symbols against symbols, as they'd only need to compare their id (which could actually just be their hash, come to think of it). In comparing symbol vs. string, the effect may be negligible.

There is also, of course, some memory use improvement since many redundant strings would become a single symbol object. (In a very large project, compare a hundred occurances of something's 8-byte hypothetical name string, versus one occurance of the string and a hundred 4-byte references.)

And no, I don't know where you live. Smile

I'm fine with either :foo or $foo or anything else so long as it works cleanly. I was just afraid :foo might make parsing (both machine and human) more complicated. $ as toString() is an interesting idea, though.
_________________
Chris Nicholson-Sauls
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
JarrettBillingsley



Joined: 20 Jun 2006
Posts: 457
Location: Pennsylvania!

PostPosted: Fri Aug 17, 2007 5:56 am    Post subject: Reply with quote

Quote:
There should be some speedup, particularly in comparing symbols against symbols, as they'd only need to compare their id (which could actually just be their hash, come to think of it). In comparing symbol vs. string, the effect may be negligible.


The comparison actually wouldn't be any faster than string-to-string, as string equality comparison already checks whether the hashes are the same or not and bails out if they aren't. If two strings (or symbols) have the same hash, then you do need to compare their contents, as two different strings could hash to the same value (of course you knew that).

OTOH if symbols were given a unique ID upon creation, there wouldn't need to be any content comparison when testing equality. I think a 32-bit integer to uniquely identify symbols would be enough (unless someone dynamically generated symbols based on the time thousands of times a second or something dumb like that).

Quote:
I'm fine with either :foo or $foo or anything else so long as it works cleanly. I was just afraid :foo might make parsing (both machine and human) more complicated.


Nope, it's no more complicated than - vs -- vs -=. If the lexer sees a :, it'll go to the next char. If it's alpha or _, it keeps lexing as a symbol; if it's a quote, a long symbol; else, it's just a plain old colon.
Back to top
View user's profile Send private message
csauls



Joined: 27 Mar 2004
Posts: 278

PostPosted: Fri Aug 17, 2007 2:55 pm    Post subject: Reply with quote

That's true. A unique (incremental) value would be best. That's actually what Ruby uses as well. ColdC symbols as I recall don't have any id associated with them, but that's a highly dynamic environment anyhow, and I don't believe ColdC symbols are guaranteed singletons (they are also required to be valid identifiers in ColdC, which is their primary purpose there).

Colon it is. Smile
_________________
Chris Nicholson-Sauls
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
JarrettBillingsley



Joined: 20 Jun 2006
Posts: 457
Location: Pennsylvania!

PostPosted: Mon Sep 03, 2007 4:46 pm    Post subject: Reply with quote

Well! To my (and probably yours, as well) considerable surprise, I implemented the symbol type, as well as new opcodes for field access separate from indexing, and found that it was about 20% slower than the MiniD 1 method. Thinking it could be the new opcodes, I removed the symbol type, and with the new opcodes but still using strings, it now runs 30% faster than the original method. So separating field access into separate opcodes gives quite a speed boost, but the symbol type makes things slower. Much slower, since that 20% slower was with the separate field opcodes as well.

I'm totally flabbergasted. I'll probably be doing some more tests to see exactly why it's so much slower, but with these preliminary results, the chances of a symbol type getting in are looking much more slim.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> MiniD All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group