Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #1126 (closed defect: fixed)

Opened 3 months ago

Last modified 2 months ago

tango.core.array.unique is wrong

Reported by: mandel Assigned to: sean
Priority: major Milestone: 0.99.7
Component: Core Functionality Version: 0.99.6 Jeff
Keywords: Cc:

Description

tango.core.array.unique doesn't return the "The number of unique elements in buf.".

The following code outputs "bacacab".

import tango.io.Stdout;
import tango.core.Array;

void main(char[][] args)
{
	char[] buf = "bbaacacaab".dup;
	uint tmp = unique(buf);
	Stdout(buf[0..tmp]).newline;
}

http://codepad.org/tpsui0fk

Also, unique is misleading. People would expect to get an unique array with no duplicates at all. But that's not the case with the current definition in the docs.

Since the result may contain duplicates, this function is misleading and can be considered dangerous.

Change History

06/04/08 10:37:33 changed by fawzi

actually unique is correct for sorted array, maybe the name should be changed uniqueSorted (but it would be longer). For sure the documentation should be more clear that if the array is not sorted then there may still be duplicated in it, one should sort it first if that is needed. As discussed in the IRC returning a slice would probably nicer to use.

06/05/08 10:30:39 changed by sean

Returning a slice is not necessary:

char[] buf = "bbaacacaab".dup;
Stdout( "Sub-sequences: " )( ( buf[0 .. buf.unique()] ).newline;

I think the docs explain exactly what the routine does, but I didn't know how best to describe the return value. I'll see about clarifying it.

06/05/08 15:34:18 changed by mandel

I think we need a real unique (maybe also uniqueSorted?) to return a slice. Its the common use case.

The current unique should at least be renamed.

Tango should embrace the principle of least surprise.

06/05/08 17:31:08 changed by sean

I may rename it to "distinct," since it's possibly a bit closer to actual behavior. For what it's worth though, I got the name "unique" from C++ -- there's an identical call in <algorithm>.

06/06/08 05:47:01 changed by mandel

yeah, unique from C++ std behaves the same; maybe it's just me then.

Still, I think a returned slice would improve code usability.

07/10/08 05:51:44 changed by sean

  • status changed from new to assigned.

I decided to change the function name to "distinct" because it seems a bit more meaningful. I've also re-worded the "Returns" comment for clarity. However, the actual behavior will stay as-is because it's consistent with the other algorithms. To get the slice you can do:

char[] buf = "abccceegggdgf";
buf = buf[0 .. buf.distinct()];
assert( buf == "abcegdgf" );

07/10/08 06:08:17 changed by sean

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [3740]) * Renamed 'unique' to 'distinct' * Fixed docs for this routine to more accurately reflect what it does

This closes #1126