Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #626 (closed wishlist: wontfix)

Opened 1 year ago

Last modified 5 months ago

MapCollections consider null keys invalid

Reported by: Some1_2 Assigned to: kris
Priority: normal Milestone: 2.0
Component: Core Functionality Version: 0.99.1 RC4 Keep
Keywords: hash map collection null key, decide Cc: sean

Description

I consider this a defect as there doesn't seem to be any good reason to disallow null keys. There might be a philosophical argument against null keys, but really, there isn't any technical reason that null can't be a hash key. Also, this is a bit inconsistent as the D associative arrays allow null keys with no issue.

See http://dsource.org/projects/tango/forums/topic/203 for further info.

Attachments

MapCollection.patch (0.8 kB) - added by Some1_2 on 09/12/07 14:11:56.
nullKeys.patch (10.3 kB) - added by Some1_2 on 09/17/07 19:15:52.
hashmap.d (3.1 kB) - added by Some1_2 on 09/17/07 19:17:23.

Change History

09/12/07 14:11:56 changed by Some1_2

  • attachment MapCollection.patch added.

09/12/07 14:24:10 changed by kris

Thx!

I recall there's some code in there that does key comparisons for equality? If so, then null keys could cause errors?

(follow-up: ↓ 3 ) 09/12/07 14:24:28 changed by kris

  • owner changed from sean to kris.
  • status changed from new to assigned.

(in reply to: ↑ 2 ) 09/12/07 15:07:17 changed by Some1_2

Replying to kris:

I haven't seen anything in the code I looked at, though I didn't go through everything with a fine toothed comb. I do know in my specific use that I haven't had any negative side effects from the supplied patch.

09/12/07 15:13:43 changed by kris

09/12/07 17:19:41 changed by Some1_2

Are these the lines you are referring to?

110 	                for (auto p=this; p; p = cast(LLPair)cast(void*) p.next_)
111 	                     if (p.key() == key)

I don't see anything wrong with key being null there, unless I'm missing something.

09/12/07 17:24:56 changed by kris

p.key() == key

if p.key() returns null, and K is an Object, BadThings?(tm) will occur due to opEqual() being invoked on a null object ?

(follow-up: ↓ 8 ) 09/12/07 18:13:04 changed by Some1_2

Had some conversation with sean_k on the irc channel about this, just pasting the relevant parts of the conversation for consideration:

<Some1_2> Wouldn't the D associative array have the same issue then? <sean_k> I would think so, but I've never tried null keys in the D AA. <Some1_2> null keys in the D AA work fine, as far as I've tested and used. <sean_k> hm <sean_k> I dunno if the containers support this now, but what I would do is have the default comparator call opEquals, opCmp, etc, and let the user override it either as a template param or when constructing the object. <sean_k> then you could get any behavior you like. <sean_k> D's internal AA calls opEquals(Object), which keys on identity by default. but it's a virtual function. <sean_k> oh, and opCmp(Object) for collisions, which I believe also keys on identity. it does in Tango anyway. <Some1_2> how does 'null == null' work? <JimPanic?> 0 == 0 => true. :P <Some1_2> Sure, so why would 'null == Object' not work? Where does the opEquals for that come from? <sean_k> Object.opEquals(Object) by default, but you can override it in derived classes. <Some1_2> And the opEquals from Object itself handles comparisons with null? <sean_k> Yes. Object cotnains no data members, so all it can do it test identity. opEquals is basically defined in Object to reserve a place for it in the vtbl for all objects, since it's used by the AA code. <Some1_2> So a problem could arise if I was using, as a key, an object that overrides opEquals and doesn't handle null comparisons? <sean_k> yup <Some1_2> Then, I would say, that's my problem, not HashMaps? <sean_k> true

(in reply to: ↑ 7 ) 09/12/07 18:14:32 changed by Some1_2

Sorry, that came out bad. Retry:

<Some1_2> Wouldn't the D associative array have the same issue then?
<sean_k> I would think so, but I've never tried null keys in the D AA.
<Some1_2> null keys in the D AA work fine, as far as I've tested and used.
<sean_k> hm
<sean_k> I dunno if the containers support this now, but what I would do is have the default comparator call opEquals, opCmp, etc, and let the user override it either as a template param or when constructing the object.
<sean_k> then you could get any behavior you like.
<sean_k> D's internal AA calls opEquals(Object), which keys on identity by default. but it's a virtual function.
<sean_k> oh, and opCmp(Object) for collisions, which I believe also keys on identity. it does in Tango anyway.
<Some1_2> how does 'null == null' work?
<JimPanic?> 0 == 0 => true. :P
<Some1_2> Sure, so why would 'null == Object' not work? Where does the opEquals for that come from?
<sean_k> Object.opEquals(Object) by default, but you can override it in derived classes.
<Some1_2> And the opEquals from Object itself handles comparisons with null?
<sean_k> Yes. Object cotnains no data members, so all it can do it test identity. opEquals is basically defined in Object to reserve a place for it in the vtbl for all objects, since it's used by the AA code.
<Some1_2> So a problem could arise if I was using, as a key, an object that overrides opEquals and doesn't handle null comparisons?
<sean_k> yup
<Some1_2> Then, I would say, that's my problem, not HashMaps?
<sean_k> true

09/12/07 19:00:25 changed by kris

Perhaps you're missing the point here: think null.opEqual()

What happens?

Yes, that can probably be worked around in a number of different and inefficient ways; but why? Just to support null keys? We'd have other folks howling at us instead :)

Null keys are non-trivial to deal with

09/12/07 19:35:14 changed by Some1_2

Well, I suppose on the 'why' part, I would argue more 'why not', as there are perfectly valid reasons to have null keys in hashes. If this were a user-library and the user who wrote it decided that he didn't personally want to ever use null keys, then I suppose that would be acceptable. But as I understand Tango is intended to be a general purpose library. In that case, just because you might not personally ever use a null key, doesn't mean that no one in the general public (for whom I think Tango is intended) also will never use one. There doesn't really seem to be any technical limitation on it, AFAIK. Couple that with the fact that D's AAs seem to work fine with null keys, and I think you've got a pretty solid argument for 'why'. Well, you asked :). That being said, if the Tango crew decides, 'I don't care about your argument, you and your nulls can go to hell', then I guess that's what you decide.

Anyway, the problem is the 'p.key() == null' section, so the scenarios are:

null == null (this is fine)
SomeObject? == null (this is fine)
SomeObject? == SomeObject? (this is fine)
null == SomeObject? (this is the sticky one)

If I understand that right, then in the fourth case, the opEquals from SomeObject? is called. That opEquals needs to be able to handle null operands, if it has overridden the opEquals from Object. The opEquals from Object itself will work fine here, the only issue would be if you use, as a key, an object that has overridden opEquals and doesn't handle null operands, in which case, I'd say that if you were overriding opEquals from Object, that you should handle all the same cases that Object does.

No?

09/12/07 21:11:32 changed by kris

No: You get address error

09/13/07 11:53:11 changed by Some1_2

I wonder why I'm not getting any sort of errors then, I should be doing that same compare quite a bit in what I'm doing. Perhaps it's because I'm not giving it a null object, I'm giving it 'null'? This:

	MyClass myClass = new MyClass;
	if (null == null)
		Stdout("null == null").newline;
	else
		Stdout("null != null").newline;
	if (myClass == null)
		Stdout("myClass == null").newline;
	else
		Stdout("myClass != null").newline;
	if (myClass == myClass)
		Stdout("myClass == myClass").newline;
	else
		Stdout("myClass != myClass").newline;
	if (null == myClass)
		Stdout("null == myClass").newline;
	else
		Stdout("null != myClass").newline;	

Gives back:

null == null
myClass != null
myClass == myClass
null != myClass

However, if I change those 'null' to a reference to an null class reference, it segfaults. I don't know why my program doesn't segfault due to the same thing.. I'm also confused. My biggest test throws ~60,000 things into various hash maps, some null, some not so.. you'd think it would break somewhere along the line.

Anyway, I suppose it could be fixed by using a 'keyCompare(p.key(), key)' function instead of 'p.key() == key', and as this seems to be my itch I'll scratch it, then post up a complete patch along with timing comparisons. If you decide it looks fine and what to include it, great :).

09/13/07 13:02:51 changed by Some1_2

Wait, all of this collection stuff is ported from Java? Is the HashMap? a direct port of java.util.HashMap?? The Java HashMap? seems to specifically allow null keys (and values). (See: http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html). Was the port perhaps of something pre-1.4.2?

09/13/07 13:13:11 changed by kris

  • type changed from defect to wishlist.

It came from Prof Doug Lea's collection package (see the header on each file), so has no heritage with the Sun approach :)

09/13/07 14:04:40 changed by Some1_2

Yeah I noticed it came from DL, once I figured out who that was and why he wrote code for Tango in 95-97 :), I got the Java connection. I didn't realize the current Java HashMap? wasn't the same one.

Still digging through this code, should probably have a patch soon.

09/13/07 17:05:40 changed by Some1_2

Trying to come up with some code that breaks reliably, so I have something to test my changes against, but having a hard time. Hoping you'll be able to tell me if I've at least got this right:

A HashMap? is essentially a bunch of LLPairs which are referenced by a hash function. (Hash function on SomeObject? returns you something that you relate to a specific LLPair). An LLPair is a linked list which contains pairs which are a key and a value. Hopefully I've got that right so far.

So then, for p.key() to be null, and then compared against a non-null key, the hash function would have had to have returned the same value for both the null and non-null key. Which is probably why I'm having such a hard time replicating an issue. Does that sound about right?

If that's the case, then the hash function (or even before it is called) could check for null keys first, and if the key is null return some value (link to some LLPair) that the hash function is never going to return (0? Maybe?), eliminating the need for a null check on keys. Does that sound remotely reasonable?

09/14/07 13:32:55 changed by Some1_2

Wrote a 'break me' example with null keys:

	auto blah = new HashMap!(char[], char[]);
	blah[null] = "null";
	blah[""] = "notnull";
	char[] myNull = blah[null];
	char[] myEmpty = blah[""];
	Stdout.format("myNull: '{}' ({}), myEmpty: '{}' ({})", myNull, myNull is null, myEmpty, myEmpty is null).newline;

Results in:

myNull: 'notnull' (false), myEmpty: 'notnull' (false)

This seems to occur because in findKey (in LLPair), when we end up with this comparison:

p.key() == key

It returns as true, when p.key is null and key is not null (empty string "").

09/14/07 14:09:11 changed by kris

That's not a particularly effective breakme example ... try a HashMap?!(Object, Object) instead, and stick some nulls in there?

09/14/07 16:17:38 changed by Some1_2

Sure, a fix for one should be a fix for the other but here's 'breakme' in that scenario:

auto blah = new HashMap!(testClass, char[]);
blah[null] = "null";

This one segfaults right away in getHash, without putting in a non-null key. Backtrace of:

#0  _D6object14TypeInfo_Class7getHashMFPvZk (this=@0x80ea730, p=0xffffc814) at genobj.d:625
#1  0x080adf10 in _D5tango4util10collection15HashMapEnhanced45__T7HashMapTC5tetra4util6Config9testClassTAaZ7HashMap6hashOfMFC5tetra4util6Config9testClassZi (this=@0xf7c6cdc0,
    key=@0x0) at tango/util/collection/HashMapEnhanced.d:701
#2  0x080ad8ca in _D5tango4util10collection15HashMapEnhanced45__T7HashMapTC5tetra4util6Config9testClassTAaZ7HashMap3addMFC5tetra4util6Config9testClassAaZv (this=@0xf7c6cdc0,
    key=@0x0, element={length = 4, ptr = 0x80d2844 "null"}) at tango/util/collection/HashMapEnhanced.d:563
#3  0x080b014f in _D5tango4util10collection4impl21MapCollectionEnhanced52__T13MapCollectionTC5tetra4util6Config9testClassTAaZ13MapCollection13opIndexAssignMFAaC5tetra4util6Config9testClassZv (this=@0xf7c6cdc0, element={length = 4, ptr = 0x80d2844 "null"}, key=@0x0) at tango/util/collection/impl/MapCollectionEnhanced.d:122
#4  0x08094600 in _Dmain () at tetra/util/Config.d:1095
#5  0x080c273b in runMain () at dgccmain2.d:272
#6  0x080c2809 in tryExec (dg={object = 0x0, func = 0x80c3680 <_D6object14TypeInfo_Class7getHashMFPvZk>}) at dgccmain2.d:228
#7  0x080c2d0c in runAll () at dgccmain2.d:280
#8  0x080c2809 in tryExec (dg={object = 0x0, func = 0x80c3680 <_D6object14TypeInfo_Class7getHashMFPvZk>}) at dgccmain2.d:228
#9  0x080c2c01 in _d_run_main (argc=1, argv=0xffffcb94, main_func=0x809458d <_Dmain>) at dgccmain2.d:287
#10 0x080c26fb in main (argc=0, argv=0x0) at cmain.d:5

Another example that should work but currently doesn't:

auto blah = new HashMap!(char[], testClass);
blah[null] = null;

This throws "tango.core.Exception.IllegalElementException?: Attempt to include invalid element _in Collection". I suppose if we're accepting null keys we should also accept null elements.

09/14/07 17:08:51 changed by Some1_2

Sorry, the linenumbers in that backtrace refer to my hacked HashMap?.d.

A null object reference doesn't even get a chance to fail trying to get it's opEquals, because hashOf asks for:

typeid(K).getHash(&key)

Which obviously doesn't work for null object reference. Does, however, work for null itself.

09/17/07 19:15:52 changed by Some1_2

  • attachment nullKeys.patch added.

09/17/07 19:17:23 changed by Some1_2

  • attachment hashmap.d added.

09/17/07 19:31:16 changed by Some1_2

Added patch which allows null keys, also adds a to the debug test for HashMap?. I checked over all other collections code and I don't believe anything I've changed has effects anywhere else, other than also allowing null keys via LLPair.

Also added source code which includes the performance checking code I used, all that I changed was the import for HashMap? to point to the current implementation and my null-key allowing one. In the source is also the code for the 3 break cases, if you use my patch the first two should work (null keys), the third (null elements) I can work on but have not yet. (It's not an itch I need to scratch, but I would do it anyway).

A quick review of the performance checking on my own box:

Current HashMap:
charKey put time: 9.36
charKey get time: 0.83
classKey put time: 11.13
classKey get time: 0.46
charKey put time: 9.66
charKey get time: 0.89
classKey put time: 11.29
classKey get time: 0.44
charKey put time: 9.68
charKey get time: 0.88
classKey put time: 11.18
classKey get time: 0.42

Null-Allowing HashMap:
charKey put time: 9.80
charKey get time: 1.01
classKey put time: 11.45
classKey get time: 0.43
charKey put time: 9.78
charKey get time: 0.96
classKey put time: 11.00
classKey get time: 0.38
charKey put time: 9.59
charKey get time: 0.96
classKey put time: 11.24
classKey get time: 0.42

It doesn't seem that there is any significant (or any at all, though it was a very short 'rough idea' test) impact to performance, really all I did was move the null-checking from put/get (where it would throw an exception) to key comparisons, so I didn't expect any large performance problems. (Removed a null-check in one place, put another one in). If anything, this patch should be (unmeasurably) faster, because I removed the overhead of a few function calls with a single function call in a place where it is admittedly in a loop, but the LLPair 'buckets' (where it would loop) shouldn't be very full anyway, or it's not a very good hash :).

The 'why null keys?' argument:
-Consistent with other language hashes. (Java, for one).
-Consistent with D's associative arrays.
-There is no technical reason not to allow them, neither any performance reason.
-General-purpose libraries shouldn't enforce individual-opinion constraints on end users. (I believe there are good use-cases for null hash keys, and even if someone else does not, that belief shouldn't stop me from use as I see fit).

(follow-up: ↓ 24 ) 09/17/07 20:02:42 changed by kris

The 'why null keys?' argument:
-Consistent with other language hashes. (Java, for one).
-Consistent with D's associative arrays.
-There is no technical reason not to allow them, neither any performance reason.
-General-purpose libraries shouldn't enforce individual-opinion constraints on end users. (I believe there are good use-cases for null hash keys, and even if someone else does not, that belief shouldn't stop me from use as I see fit).

Internal consistency within the library is very important to us. Consistency, with edge-case scenarios, against other libraries does not matter too much. Strike that one off :)

You claim there is no technical reason to avoid nulls? That's a pretty sweeping claim :) Yes, the check is potentially evaluated multiple times instead of one. But perhaps hasn't occurred to you that any of the predicates that people write would have to do null checks, just in case? That is just bad design, and is a technical concern for me.

I'm not sure where you're getting the last thing from? "General-purpose libraries shouldn't enforce individual-opinion constraints on end users" ... did it occur that's what you could be doing here? Enforcing users to deal with nulls when they write predicates or whatever?

This is one situation where things are not as simple as they might seem. Yeah, the containers could theoretically support nulls (that's never been a question) ... but whether we should we impose that design limitation on the containers themselves is a much more difficult question to answer.

It's cool that you're making the effort here, and I'm glad that you are. But there's no simple answer at this time.

(follow-up: ↓ 25 ) 09/17/07 20:14:24 changed by kris

Amongst others, there's also this concern: for some containers, we expect to provide lock-free variations. Some of those designs depend on being able to reserve null to indicate a partial update to something. There are workarounds, but I wouldn't wish to complicate something that's already fraught with peril (a dependable lock-free implementation).

If you have a set of lock-free containers lying around, which you can port to D, document them, make them support templates, along with null keys, then your proposal would likely have less concerns attached to it :)

(in reply to: ↑ 22 ) 09/18/07 02:19:54 changed by Some1_2

Replying to kris:

Internal consistency within the library is very important to us. Consistency, with edge-case scenarios, against other libraries does not matter too much. Strike that one off :)

In an of itself, it's not a complete argument. However, what it does show is that the use of null keys in hashes is not unusual, and has been specifically allowed in other hashes. What I'm trying to say here is, it's not as though it's 'just me' that would consider using null keys an acceptable (and arguably elegant) solution in some cases. However, for a lot of the work I do specifically, this is not an edge case, but is indeed a norm. Every other hash implementation I've used in any language hasn't put any restriction on the null key. (Indeed, Java is very specific about allowing both null keys and values, http://java.sun.com/javase/6/docs/api/java/util/HashMap.html). Even databases, which one could consider hashes as crude forms of, specifically allow null keys. Why I make mention of this is to simply put forth that this isn't something quirky that I'm asking for just for me, but is, I feel, a valid and reasonable request.

With all that being said, even if you decide to ignore other language implementations (to which I would say, it's not a bad idea to acknowledge them anyway), what about the very language Tango is written in? Should you at least not strive for consitency within that? The associative arrays, which one could consider the built-in hashes in D, have no issue with null keys.

You claim there is no technical reason to avoid nulls? That's a pretty sweeping claim :) Yes, the check is potentially evaluated multiple times instead of one. But perhaps hasn't occurred to you that any of the predicates that people write would have to do null checks, just in case? That is just bad design, and is a technical concern for me.

What I mean by no technical reason is that there is no limitation such that we have to enforce the no-null keys restriction. It's an artificial restriction (which, as I understand, was put there over 10 years ago for an older version of Java which has since been supplanted with a version that does not have this restriction), specifically put in place by checking for null on every key operation. If, instead of checking for null on key operations, we check for null on key comparisons, this restriction is removed.

As for calling checking for nulls 'bad design', I think that you should put 'in my opinion' before that, because I don't think that it is a universal truth. I would argue that it is good, self-defensive programming practise to check for 'edge-cases'.

I'm not sure where you're getting the last thing from? "General-purpose libraries shouldn't enforce individual-opinion constraints on end users" ... did it occur that's what you could be doing here? Enforcing users to deal with nulls when they write predicates or whatever?

I suppose that I would be enforcing that the end-users themselves would have the option to choose (or not) to use null-keys, this much is true.

The feeling that I'm getting from you is that you have relegated this request to /dev/null, because you don't personally feel it's something you would use. If you don't ever want to see a null-key be used, and forbid every programmer you can from doing so, that's perfectly within your perogative. But, I feel, the opposite should also be true. However you want to use hashes in your own projects is up to you, and anyone else using them in their own projects should also be free to choose, not have a specific type of use dictated to them by the general library. Just because you wouldn't personally use it doesn't make it something non-viable for someone else to use, and hence I have tried to write the patch to show it's not a performance concern and also not all that difficult to implement, in an effort to get you to reconsider.

The end user who wasn't putting nulls in their hashes to begin with is not affected by this change, they can continue to use the hashes as they have, with no ill side-effects.

The end user who does want to use null keys however, now can, without artificial limitations in place.

Someone who is not using the hashmaps typically may have to validate their input data, as you point out, but I wouldn't consider this a bad idea anyway, and in any case, wouldn't this be something outside of the typical use-case for the HashMap??

This is one situation where things are not as simple as they might seem. Yeah, the containers could theoretically support nulls (that's never been a question) ... but whether we should we impose that design limitation on the containers themselves is a much more difficult question to answer.

As I mentioned before, my argument here is why _not_ allow them? You said at first that it was because it was non-trivial and not worth the effort. I believe I've shown that it's not difficult, and have expended the effort myself. You implied allowing null keys would reduce effciency, I believe I've shown that this is not the case. So now it's because it might mean that someone working deep within the hashmap might have to validate keys for their particular use?

As for answering the difficult question, I would say that artificial restrictions should only be imposed on the end-user if there is a very good, compelling reason to do so, not the other way around (remove restrictions if there are good, compelling reasons to do so). Innocent until proven guilty. Despite that, I believe that I have provided good, compelling reasons to remove this restriction. It's not really doing any one any good being there, and if the implementations of other languages is any indicator, I will not be the last to request this restriction relaxed.

It's cool that you're making the effort here, and I'm glad that you are. But there's no simple answer at this time.

Though I might be biased ( :) ), it seems to me that there is more weight on the 'pro' to removing this restriction side than there is on the 'con' side. I'm really not trying to be argumentative or rude, but it really seems that this is more of a "Kris doesn't use null keys, so no one else should" discussion, and there's no way I can win the "Because I say so" argument. :)

So, here's to hoping that providing the patch will at least encourage you to reconsider firstly the validity of using null-keys in hashes, and secondly the relative ease of allowing such use. And if nothing else, I at least now have a good understanding of how the collections work, having spent a few days at work on this.

(in reply to: ↑ 23 ) 09/18/07 02:30:59 changed by Some1_2

Replying to kris:

Amongst others, there's also this concern: for some containers, we expect to provide lock-free variations. Some of those designs depend on being able to reserve null to indicate a partial update to something. There are workarounds, but I wouldn't wish to complicate something that's already fraught with peril (a dependable lock-free implementation).

If there are other concerns that haven't been raised, it would be a lot easier for me to address them if I knew what they were. :)

I would have to see what sort of specific implmentation you are refering to, but, if I understand what you're saying correctly, trying to use null as some sort of make-shift mutex in a 'lock-free' implementation sounds like a very bad idea to me right off the bat. Setting any value to null isn't going to be a reliable, guaranteed atomic operation anyway, so I would think one would want to consider some sort of alternative despite if the hash allows null keys or not.

If you have a set of lock-free containers lying around, which you can port to D, document them, make them support templates, along with null keys, then your proposal would likely have less concerns attached to it :)

I'd really have to know a bit more about what you mean by lock-free container. We do have quite a bit of experience dealing with concurrent operations however, and writing something like what I think you're talking about might be possible to look into. I would also be quite willing to help with writing another complete collection implementation, I've heard rumours this could be happening.

(follow-up: ↓ 27 ) 09/18/07 11:35:35 changed by kris

"The feeling that I'm getting from you is that you have relegated this request to /dev/null, because you don't personally feel it's something you would use."

Then you should perhaps consider giving others a little more credit? The notion is not ruled out, and never has been; but it has a long way to go before being accepted.

Give yourself some time to consider how introducing null into the design of these containers would potentially create use issues. Learn about lock-free implementation, and about the difficulties involved (or, at least what the term means?). And, please try to keep an open mind, rather than just assuming some kind of personal 'vendetta' is in place

(in reply to: ↑ 26 ) 09/18/07 12:24:31 changed by Some1_2

Replying to kris:

Then you should perhaps consider giving others a little more credit? The notion is not ruled out, and never has been; but it has a long way to go before being accepted.

Well, you are the only one to respond, and, it's a bug under your control. Who else would I give credit to? I say it seems that way because you bring up a concern, I address it, then you say, 'well yeah but..' and bring up another concern, rinse, repeat. Also, you tend to completely ignore vast parts of what I'm trying to get across.. I don't know if you're trying to or not, but it comes across as completely dismissive. I don't think the same can be said for myself, I've not ignored any point you've brought up.

Give yourself some time to consider how introducing null into the design of these containers would potentially create use issues. Learn about lock-free implementation, and about the difficulties involved (or, at least what the term means?). And, please try to keep an open mind, rather than just assuming some kind of personal 'vendetta' is in place

I don't think it's a personal vendetta against me, but perhaps against null :). We've gone from "some comparison might fail" (which I initially misunderstood, sorry, I come from C, doesn't mean I'm a complete idiot :P) which was resolved, to "It will be difficult to do" (which I think I've shown it wasn't) to "It will degrade performance" (which I've shown is not true). And now we're down to "someone writing predicates might have to take this into consideration" (Oh noes! Also, that same someone might actually also want null keys...).

I do know what a lock-free implementation means... I know you don't know me from anyone but gee, at least give me some credit. What I was asking for was a bit more of an example of what you mean by a lock-free container being affected by allowing nulls in the container keys, because I can't see a valid lock-free implementation assuming that setting a value to null will be an atomic (or even cache-synced) operation.

At some point, you're going to have to ensure syncronization, and you will have to take multiple core systems and non-amd/intel architectures into consideration, and a simple 'set the key to null' is not going to solve all that for you. If you are not using a hardware-backed lock, you are going to be relying on using some assembly instruction which you know operates atomically. Then, you'll have to consider that what might be an atomic operation on one architecture is not on another. Likely, you're going to want to be using a 'move' instruction, so that you can set up your new data in it's own independant area, then atomically 'move' it after creation into it's new position in the hash. You wouldn't want to run through 3 separate, hopefully atomic 'set null', 'insert', 'unset null' instructions, would you? And you aren't going to use some random hash key as a 're-hashing' flag. So how would allowing null keys in a hash affect any reasonable lock-free implementation?

As for keeping an open-mind on the subject... I believe I could say the same thing, seems I enjoy banging my head against this wall.

Let me ask this, do you agree that using null keys in a hash can be a potentially reasonable design decision?

(follow-up: ↓ 29 ) 09/18/07 13:26:50 changed by kris

Assigning a null is an atomic operation on popular CPUs, and for other scenarios perhaps you haven't noticed tango.core.Sync?

And no, we haven't gone from one notion to another: all concerns are still present, including performance-related ones. The list of issues is increasing only because you're apparently not considering them yourself. The point here is that perhaps you're not considering all implications of the suggested change, have dismissed them as irrelevant, or are biasing your perspective. I certainly haven't thought of all of them so, at this time, I do not agree that the change is the right thing to do. There are many changes in Tango that happen in the blink of an eye; this is not one of them I'm afraid.

it's a bug under your control :: no, this is not a bug. Instead, it was a conscious design decision by Doug Lea, and one that we have to respect and consider at length before making any kind of change. This is simply a wishlist request from you. Nothing more. It has questionable implications, making it somewhat slippery. I've noted this already, and I've suggested what you can do to eliminate some of those concerns. You've chosen to ignore that. Lastly: warping a respect for the original design into "because you don't personally feel it's something you would use" is shaky ground at best, and does little to promote your position.

As for your question: Yes, it can sometimes be a reasonable design decision -- given the right circumstances.

(in reply to: ↑ 28 ) 09/18/07 14:39:27 changed by Some1_2

Replying to kris:

Assigning a null is an atomic operation on popular CPUs, and for other scenarios perhaps you haven't noticed tango.core.Sync?

Even if it was a guaranteed atomic operation everywhere, I'm not sure why would you have a need to set a key to null. Wouldn't you be moving some sort of pre-arranged data into the previous location using some sort of move? Why would you 'set null'-'move' when you could simply 'move'? I suppose perhaps you are thinking of the approach in the google video (which I'd say has other issues), but I still don't know why you'd have a need to partially update something and block other accesses to it. Despite this, it's not as if having a null key would disallow it, would it? Couldn't you set the LLPair container itself to null? Not the LLPair key? Wouldn't this achieve the same goal, and perhaps be even arguably simpler? (In effect, performing CAS on the container rather than it's elements).

And no, we haven't gone from one notion to another: all concerns are still present, including performance-related ones. The list of issues is increasing only because you're apparently not considering them yourself. The point here is that perhaps you're not considering all implications of the suggested change, have dismissed them as irrelevant, or are biasing your perspective. I certainly haven't thought of all of them so, at this time, I do not agree that the change is the right thing to do. There are many changes in Tango that happen in the blink of an eye; this is not one of them I'm afraid.

I absolutely agree with you that there may be implications not yet considered. However, is not the argument that 'there may possibly be something we haven't thought of' reasonably thin? My argument is that there doesn't seem to be any compelling reason to disallow the use of null keys, and perhaps I am biased in the sense that I believe that unless there is a very good reason to restrict something, that it shouldn't be restricted.

it's a bug under your control :: no, this is not a bug. Instead, it was a conscious design decision by Doug Lea, and one that we have to respect and consider at length before making any kind of change. This is simply a wishlist request from you. Nothing more. It has questionable implications, making it somewhat slippery. I've noted this already, and I've suggested what you can do to eliminate some of those concerns. You've chosen to ignore that. Lastly: warping a respect for the original design into "because you don't personally feel it's something you would use" is shaky ground at best, and does little to promote your position.

It's not a bug to you because you decided not to consider it so, and downgrading it to 'wishlist' only reflects your personal opinion on the subject, which is fine, but clicking the drop-down does not suddenly make it a universal truth. :). To you, it's nothing more than a wish, fine. To me, it's an restricted implementation limited for questionable reasons, and I consider that defective. So, we have a difference of opinion on the 'type' of ticket, big deal, but there's no 'absolute truth of ticket type' here that I'm going against by considering it non-complete and defective, is there?

In the same vein of respecting Doug Lea's original design decision and having respect for it, does it not also make sense to respect and consider the design decisions behind the hash which replaced it? Would it not also make sense to consider and respect AAs?

Speaking of DL's decision, he makes some mention of why he chose not to allow null keys here: http://gee.cs.oswego.edu/dl/classes/collections/index.html, where he mentions that "One reason that nulls are excluded is that some collection operations intrinsically rely upon operations Object.equals and Object.hashCode, neither of which apply to nulls.", which is true, however has been addressed by the use of null-aware keyCompare and hashOf functions.

Which suggestions from you to eliminate concerns have I chosen to ignore? I think I've verbosely addressed every concern you've brought up, other than to create lock-free versions of every collection, something which doesn't exist right now and, that despite that, I have asked for clarification on why this change would complicate such creation. If I've left something else seemingly ignored, please let me know as it's not my intention.

As for trying to 'warp respect', I can't do more than say that it is simply not the case. If you can't understand why you may have come off as dismissive then that's fine. Quite possibly I mistook your manner, which is why I was careful to say that it 'seems like', and not that it was.

As for your question: Yes, it can sometimes be a reasonable design decision -- given the right circumstances.

Well, good, maybe I'm not crazy after all :)

Let me ask another question then, is it not reasonable that someone using associate arrays may decide to move to using a HashMap?? It would seem to me that this would be a typical thing for someone to do, and that in the interest of consistency within the language, the HashMap? should not impose restrictions that the AA does not.

(follow-up: ↓ 31 ) 09/18/07 14:47:46 changed by kris

this is going absolutely nowhere fast, so no further comment :)

(in reply to: ↑ 30 ; follow-up: ↓ 33 ) 09/18/07 15:23:43 changed by Some1_2

Replying to kris:

this is going absolutely nowhere fast, so no further comment :)

I see... so, I'll just take that as my 'go to hell' (complete with smiley) then. No sense trying to hold a discussion with someone who isn't receptive.

(follow-up: ↓ 34 ) 09/18/07 17:14:43 changed by larsivi

  • cc set to sean.

I'm impressed by the bickering. Some1_2, I've yet to see any "go-to-hell" mentality in Kris, and I've suggested reams of stupid stuff. There are good reasons for why only Kris has replied this far; we're all insanely busy, and any request to change behaviour is a wishlist item however problematic the current behaviour may be to you. And when Kris replies as he does in his last reply, it is because he too is very busy.

Obvious bugs will be prioritized as far as we are able to prioritize anything this month. Personally I think you have made a good start at showing how your wish can be fulfilled, but accusing us of of having non-technical reasons for not applying your patches won't help. :)

Sean wants to look into the collection package too, so I'll CC him in case he has any immediate thoughts on the subject.

As for me, I don't have time to read the full ticket in detail to make an informed opinion just yet.

(in reply to: ↑ 31 ; follow-up: ↓ 35 ) 09/18/07 17:35:14 changed by kris

Replying to Some1_2:

Replying to kris:

this is going absolutely nowhere fast, so no further comment :)

I see... so, I'll just take that as my 'go to hell' (complete with smiley) then. No sense trying to hold a discussion with someone who isn't receptive.

As you choose.

However, if you look at this rationally, I've previously noted what you can do to reduce the current level of concern, and I've said that this requires more time to consider. The direction the thread has recently taken is not exactly conducive to that process, and I honestly don't much care for various misrepresentations being bandied about here. Hence no further comment. I sure hope that's OK with you

(in reply to: ↑ 32 ) 09/18/07 18:20:48 changed by Some1_2

Replying to larsivi:

I'm impressed by the bickering. Some1_2, I've yet to see any "go-to-hell" mentality in Kris, and I've suggested reams of stupid stuff. There are good reasons for why only Kris has replied this far; we're all insanely busy, and any request to change behaviour is a wishlist item however problematic the current behaviour may be to you. And when Kris replies as he does in his last reply, it is because he too is very busy.

I'd prefer to label it 'discussion' as opposed to 'bickering', but ok. :) I have no issues with people being busy, but unless someone says, "I'm too busy to consider this right now", how would I otherwise know? As for the classification, I don't really care, all I was trying to point was that the categorization is really in the eye of the beholder. You are, of course, free to disagree on how I initially classified it :).

Obvious bugs will be prioritized as far as we are able to prioritize anything this month. Personally I think you have made a good start at showing how your wish can be fulfilled, but accusing us of of having non-technical reasons for not applying your patches won't help. :)

Fair enough, I suppose I made a leap of judgement there. My bad. To be fair though, I wasn't trying to declare outright that you were all a bunch of evil dictators bent on the annihilation of nulls throughout the hash world. :) I was just trying to suggest (perhaps too strongly) that _if_ that were the case, that there may be others (other than myself, even) that didn't share the same view, and that it may not be an unreasonable thing to expect.

Sean wants to look into the collection package too, so I'll CC him in case he has any immediate thoughts on the subject. As for me, I don't have time to read the full ticket in detail to make an informed opinion just yet.

Sure. Let me also explain a bit why I seem to care so much about the Tango collection doing what I want instead of simply running out and writing my own and avoiding all this :)

Our little dev studio here has decided on using D for our next project, and as such will be using it for years to come. We were previously a C shop, prefering it to C++, but have been following D for years, and have twice in the past evaluated it for production use to replace C.

It's only recently that we've decided that it's mature enough, and have taken the D plunge. We are indeed newcomers to D, but I'd like to think that we also somewhat know a bit about programming in general. :)

We like the direction the Tango library is heading so far and would very much like to aid in it's development (in fact, I believe some patches are already in, as I recall). That also of course means that the library would have to work for us, and we make very extensive use of hash tables in multiple areas of our product. A well functioning hash is quite critical to us.

We very much would prefer making use of a library hash (reaping the benefits of open source development, and all that) rather than to maintain one on our own (The AAs were very good in terms of use, if they weren't a tad bloated and slow). None of that matters to my argument, of course, but I just thought I'd throw that out there to give some background to 'who is this crazy guy and why does he care so much'.

I've sort of already taken it as a foregone conclusion that there's no way this is going to fly here, and we've discussed it and decided that what we're doing with null keys could only be altered by means of some sort of (in our opinion) a ugly hack, so I have already started porting over some of our own C container code over to D for our own use. If there would also be any interest in adding a set of containers to Tango I'd be happy to explore that too, what I'll be porting from C isn't nearly as 'abstract' or 'javafied' :) as the DL collections port but I do think it's a robust set of in-the-trenches usable containers.

Also, if there is some interest in revisiting this ticket later, I would be happy to provide any help I can.

Yes, I type a lot. Sorry about that.

(in reply to: ↑ 33 ) 09/18/07 18:30:46 changed by Some1_2

Replying to kris:

As you choose.

It wasn't really my choice.

However, if you look at this rationally, I've previously noted what you can do to reduce the current level of concern, and I've said that this requires more time to consider. The direction the thread has recently taken is not exactly conducive to that process, and I honestly don't much care for various misrepresentations being bandied about here. Hence no further comment. I sure hope that's OK with you

I don't think I've been irrational, I'm not sure if perhaps you have some image of me as a fuming angry man, foaming at the mouth, but I'm not. I think I'm having issues communicating with you, which could very well be simply due to misunderstanding due to conversing in type. Perhaps if this were an in-person conversation it would be simpler, but you don't know me and I don't know you, so all we have to go on are assumptions. :)

Anyway, I apologize, but I feel I'm still a bit in the dark as to what you are referring to in regards to 'reducing the level of concern', because unless I've missed out on something, I thought I had addressed all that you had brought up, with the exception of writing a lock-free implementation. Please, if I'm ignorant of something specific, if you could do me the courtesy of letting me know what it is, I would appreciate it.

09/18/07 18:36:23 changed by larsivi

The ticket is in no way closed, so it will be revisited in at least some respect later :)

11/02/07 22:56:42 changed by kris

  • milestone set to 2.0.

02/10/08 15:37:40 changed by svanleent

OK, if the null check is problematic (especially with it not being an object), perhaps the following solution might be getting somewhere:

if a user puts a null key or null value:

instead of using null, use a singleton object, such as NullReference?. It needs a check on whether the given object is null.

When a user gets a null key:

use the singleton NullReference?

When a user gets the NullReference? value:

return null instead

Within comparison routines, let the user use NullReference? (perhaps a bit of discussion is needed here).

05/25/08 11:01:32 changed by larsivi

  • keywords changed from hash map collection null key to hash map collection null key, decide.

05/26/08 15:47:26 changed by kris

  • status changed from assigned to closed.
  • resolution set to wontfix.

The current collection package will not support null as a valid value