[[TOC()]] = The collection classes = Tango provides a number of collection classes, sometimes also called container classes. In tango they are located in the '''tango.util.collection''' package, which is home to a variety of templated containers. Collection classes store ''things'', and do so in different ways. For example, the user can choose between a list and an array to store the same information - each storage mechanism has its own pros and cons. The collection classes are templates, which means they need a type parameter specified for the ''things'' to be stored. It is good practice to define an alias for the type one wants to use {{{ #!d alias LinkSeq!(int) MyIntList; auto list = new MyIntList; list.append(1); }}} == Types of containers == The following table is an overview of the existing collections. The first column list the kind of storage, the column heading list the kind of implementations: || || '''Array''' || '''Link''' || '''Tree''' || '''Hash''' || || Bag || [[docs(ArrayBag,tango.util.collection.ArrayBag.html,)]] || || [[docs(TreeBag,tango.util.collection.TreeBag.html,)]] || || || Set || || || || [[docs(HashSet,tango.util.collection.HashSet.html,)]] || || Sequence || [[docs(ArraySeq,tango.util.collection.ArraySeq.html,)]] || [[docs(LinkSeq,tango.util.collection.LinkSeq.html,)]], [[docs(CircularSeq,tango.util.collection.CircularSeq.html,)]] || || || || Map || || [[docs(LinkMap,tango.util.collection.LinkMap.html,)]] || [[docs(TreeMap,tango.util.collection.TreeMap.html,)]] || [[docs(HashMap,tango.util.collection.HashMap.html,)]] || '''Bag''' This is the simplest container. Throw everything in. The sequence of elements is not defined. '''Set''' In a set there are no duplicates. The distinction between Set and Bag is that the latter may have duplicates. '''Sequence''' A container that will store the elements in the sequence they were added. '''Map''' An associative container. It has a set of keys. Each key is unique and is mapped to a value. Both the key and the value type are template parameters. '''Arrays''' They can access their members very fast and without the need for any special sequence. But inserting an element sometimes requires the array to be resized. Resizing the array is very slow. The array then needs to be reallocated and all data must be copied over. '''Linked lists''' These containers increase their size with constant cost. Whether prepending, inserting, or appending, every element is allocated and linked in constant time. The drawback is that finding an element in the middle of a linked list requires iteration over the list. For big data collections, this is very slow. '''Trees''' Containers that use a binary tree to access their members. The cost for insertion and the retrieval is low. '''Hashes''' Hashes are the fastest container in general, but they need much memory. Also the elements need to support a good '''toHash()''' implementation. == Model and Method Overview == * [http://www.dsource.org/projects/tango/browser/trunk/doc/images/collection.png?format=raw Relationships between the collection classes] * [http://www.dsource.org/projects/tango/browser/trunk/doc/images/methods.png?format=raw Methods exposed by the collection classes] == Create collections == The easiest form is to instantiate directly and use an auto variable to hold the reference. {{{ #!d auto list2 = new LinkSeq!( int ); }}} In most cases a collection type is use more than once. So it is good to have an alias. {{{ #!d alias LinkSeq!( char[] ) StrList; StrList list1 = new StrList; }}} == Add and remove elements == Each container type has it own set of methods. {{{ #!d list1.append( "The first item in list1" ); list2.prepend( 2 ); }}} See the [http://www.dsource.org/projects/tango/browser/trunk/doc/images/methods.png?format=raw Methods exposed by the collection classes] to get a good overview. == Visiting contained elements == The typical, and simplest pattern is {{{ #!d foreach (key, value; map) Stdout.formatln ("Key: {0}, Value: {1}", key, value); }}} One variation is using only 'value' instead of both 'key' and 'value'. Both of these utilize a slightly more efficient mechanism to visit each contained element (or element pair), as they can avoid heap activity when instantiating an '''Iterator'''. Collections also expose iterators, supporting both ''foreach'' and a familiar '''more()'''/'''next()''' pattern. Each container exposes method(s) to instantiate an appropriate '''Iterator''' {{{ #!d auto it = map.keys(); // The call to .more returns true if there are more elements, // and switches the iterator to the next one if available. while(it.more){ char[] key; auto value = it.get(key); Stdout.formatln ("Key: {0}, Value:{1}", key, value); } }}} For associative containers, the iterator's '''get''' method has an out parameter for key retrival. The '''keys()''' method is used to also get a set of the key values. For the 'normal' containers, like '''!LinkSeq''' etc. the '''elements()''' method is used to get a conventional iterator. Each container also supports ''foreach()'' directly. === Asynchronous mutation during iteration === Iterators are sensitive to container changes whilst they operate, and will throw an exception when this occurs. This it to ensure the client sees a consistent view of the container for the lifetime of the iterator. === Caution with iterators === It is currently not possible to modify the container while iterating over it with ''foreach'' or an iterator. To delete certain elements from a container it is necessary to do it in two steps. First find the elements to delete, then do the deletion {{{ #!d // 1. Build a list of elements to delete auto dellist = new LinkSeq!(int); foreach( int i; container ){ if( i >= 3 && i < 7 ){ // container.remove( i ); /* NOT POSSIBLE */ dellist.append( i ); } } // 2. Iterate over this deletion list and // delete the items in the original container. foreach( int i; dellist ){ container.remove( i ); } }}} This may be rectified in the future. == !InterleavingIterator == This more complex ''Iterator'' shows why iterators are still useful, even if we have the ''foreach'' construct. '''[[docs(InterleavingIterators, tango.util.collection.iterator.InterleavingIterator.html,)]]''' allow you to combine the elements of two different enumerations as if they were one enumeration before they are seen by their ''consumers''. This sometimes allows one to avoid use of a collection object, to temporarily combine two sets of Collection '''elements()''' that need to be collected together for common processing. The elements are revealed (via '''get()''') in a purely interleaved fashion, alternating between the first and second enumerations unless one of them has been exhausted, in which case all remaining elements of the other are revealed until it too is exhausted. {{{ #!d import tango.util.collection.ArrayBag; import tango.util.collection.iterator.InterleavingIterator; void testInterleavingIterator(){ int[] result; // Create and fill the containers auto l1 = new LinkSeq!(int); auto l2 = new ArraySeq!(int); for( int i = 0; i < 6; i+=2 ){ l1.append( i ); l2.append( i+1 ); } // define the InterleavingIterator auto it = new InterleavingIterator!(int)(l1.elements, l2.elements); // use the InterleavingIterator to iterate over the container while (it.more) result ~= it.get; // test the result assert(result == [0, 1, 2, 3, 4, 5], "testInterleavingIterator"); } }}} == !FilteringIterator == '''[[docs(FilteringIterators,tango.util.collection.iterator.FilteringIterator.html,)]]''' allow you to filter out elements from other enumerations before they are seen by their ''consumers'' (i.e. the callers of '''get'''). {{{ #!d import tango.util.collection.LinkSeq; import tango.util.collection.ArrayBag; import tango.util.collection.iterator.FilteringIterator; void testFilteringIterator(){ int[] result; alias ArrayBag!(int) IntBag; // Create and fill the container auto ib = new IntBag; for( int i = 0; i < 20; i++ ){ ib.add( i ); } // define the FilteringIterator with a function literal auto it = new FilteringIterator!(int)( ib.elements(), (int i){ return i >= 3 && i < 7; }); // use the FilteringIterator to iterate over the container while (it.more()) result ~= it.get(); // test the result assert( result == [ 3, 4, 5, 6 ], "testFilteringIterator" ); } }}} == Comparators == Sorted containers like trees, need to compare the elements. The algorithm that is used for this comparison can be user specified. E.g. strings can be sorted in different ways. If you want a bag to have its addIf using your compare method, this can be done like this {{{ #!d // Create a bag that is not case sensitive auto nameSet = new TreeBag!(char[])( null, new class() Comparator!(char[]){ int compare( char[] first, char[] second ){ return Ascii.icompare( first, second ); } }); nameSet.addIf( "Alice" ); nameSet.addIf( "Bob" ); nameSet.addIf( "aliCe" ); // will not be added }}} == Screeners == Each container has the possibility to have a predicate as a so called screener. The definition can be found in the '''Collection''' class. {{{ #!d alias bool delegate(T) Predicate; }}} This delegate will be called, each time a new element is to be inserted into the container. If the screener returns false, the container throws an '''!IllegalElementException''' to the caller. {{{ #!d // Create a restricted container auto ratioSamples = new ArraySeq!(float)( (float v){ // restrict element to the range 0.0 .. 0.99999.. return v >= 0.0f && v < 1.0f; }); // try to insert some values try{ ratioSamples.append( 0.0f ); // OK ratioSamples.append( 0.5f ); // OK ratioSamples.append( 0.99999f ); // OK ratioSamples.append( 1.0f ); // Bang // will never get here assert( false ); } catch( IllegalElementException e ){ } }}} To test if an element is allowed for a certain container, use the {{{bool allows( T )}}} method, which is part of the Collection class. == Bag and Set == [[docs(Bags,tango.util.collection.model.Bag.html,)]] are collections supporting multiple occurrences of elements. [[docs(Sets,tango.util.collection.model.Set.html,)]] do only support unique elements. In both, the sequence of elements is not defined. An exception is !TreeBag, there the elements are ordered in their natural way. Or the user supplies a Comparator object. Both '''bags''' support the '''addIf()''' method. So they can be used like '''sets'''. [[docs(HashSet,tango.util.collection.model.HashSet.html,)]] uses a hash algorithm to find potential duplicates very fast, but this algorithm can use a lot of memory. == Sequence, the ordered storage == Sequences are indexed, sequentially ordered collections. Indices are always in the range 0 .. size() -1. All accesses by index are checked, raising exceptions if the index falls out of range. The '''elements()''' enumeration for all sequences is guaranteed to be traversed (via '''more()/get()''') in sequential order. * [[docs(Seq,tango.util.collection.model.Seq.html,)]] is an interface for all sequences * [[docs(ArraySeq,tango.util.collection.ArraySeq.html,)]] implementing class, uses an array. Best used when the storage size is not changed too often, or a fast element access per index is needed. * [[docs(LinkSeq,tango.util.collection.LinkSeq.html,)]] implementing class, uses element nodes which are allocated dynamically and linked with each other. This implementation can easy append elements and remove elements from the head (take()). Making random access to elements per index is slow, because it is neccessary to follow the links from the beginning of the storage. * [[docs(CircularSeq,tango.util.collection.CircularSeq.html,)]] is similar to !LinkSeq. But this class uses double linked cells. So all operations are equal in operation, nevertheless the are started from the beginning or end of the list. == Map, the associative Storage == [[docs(Maps,tango.util.collection.model.Map.html,)]] maintain keyed elements. Any kind of Object or primitive type may serve as a key for an element. Each key is unique and is associated with one value. {{{ #!d import tango.store.HashMap; // ... alias HashMapT!(char[], char[]) StrStrMap; auto map = new StrStrMap(); map.add("key1", "value1"); char[] key = "key1"; Stdout.format ("Key: {0}, Value:{1}", key, map.get(key)).newline; }}} = Hash vs Tree = '''Sets''' and '''Maps''' are available with tree or hash implementation. When should one choose which? Using a tree means it is necessary to navigate a tree structure to find an element. The number of stages to traverse is around O(log2(n)). When using a hash, an array is used to store a given number of buckets. A hash function is used to calculate a hash from the object value. A hash is a n-bit integer value, which should be mostly unique for each object. The same content of an object must always calculate the same hash value. This value is used when the bucket to store or retrieve the element is chosen. In the best case, this algorithm can find the element directly (if the bucket contains exactly one element). That would be access in constant time or O(1). But if the bucket holds many elements, then (depending on the bucket implementation) this can lead to a linear search. In a good hash this should not happen. To ensure the buckets are not too full, the array for them should be chosen big. This means, hash needs much more memory than tree and a good quality hash function for the stored object, to be efficient. = D builtin arrays and hashes vs. !ArraySeq and !HashMap = D already supports builtin dynamic arrays and hashes. Why should one use a template container for these container types? There is nothing against the builtin arrays/hashes, use them if they have all you need. Be aware of the additional features the tango container !ArraySeq/!HashMap has: * A screener function to restrict the inserted elements to a configurable set. * Iterators that can implement various behaviour. * the tango collections are compatible between each other, because they implement common interfaces. So it is theoretical possible to exchange a !ArraySeq container with a !LinkSeq container, if you only use the methods from View. = In Review = Don't reinvent the wheel, use containers where ever possible. Choose the appropriate container class for you needs. == Suggestion for this documentation == * Show the use of procedure == References == * For further readings on hash tables, see this [http://en.wikipedia.org/wiki/Hash_table Wikipedia-article] == Illustrations == [[Image(source:/trunk/doc/images/collection.png)]] [[Image(source:/trunk/doc/images/methods.png)]] == User Comments == [[EmbedReplies(DocComments,ChapterStorage)]]