Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

The collection classes

Tango provides a number of collection classes, sometimes also called container classes. In tango they are located in the tango.util.collection package, which is home to a variety of templated containers.

Collection classes store things, and do so in different ways. For example, the user can choose between a list and an array to store the same information - each storage mechanism has its own pros and cons. The collection classes are templates, which means they need a type parameter specified for the things to be stored.

It is good practice to define an alias for the type one wants to use

alias LinkSeq!(int) MyIntList;

auto list = new MyIntList;
list.append(1);

Types of containers

The following table is an overview of the existing collections. The first column list the kind of storage, the column heading list the kind of implementations:

Array Link Tree Hash
Bag ArrayBag TreeBag
Set HashSet
Sequence ArraySeq LinkSeq, CircularSeq
Map LinkMap TreeMap HashMap

Bag

This is the simplest container. Throw everything in. The sequence of elements is not defined.

Set

In a set there are no duplicates. The distinction between Set and Bag is that the latter may have duplicates.

Sequence

A container that will store the elements in the sequence they were added.

Map

An associative container. It has a set of keys. Each key is unique and is mapped to a value. Both the key and the value type are template parameters.

Arrays

They can access their members very fast and without the need for any special sequence. But inserting an element sometimes requires the array to be resized. Resizing the array is very slow. The array then needs to be reallocated and all data must be copied over.

Linked lists

These containers increase their size with constant cost. Whether prepending, inserting, or appending, every element is allocated and linked in constant time. The drawback is that finding an element in the middle of a linked list requires iteration over the list. For big data collections, this is very slow.

Trees

Containers that use a binary tree to access their members. The cost for insertion and the retrieval is low.

Hashes

Hashes are the fastest container in general, but they need much memory. Also the elements need to support a good toHash() implementation.

Model and Method Overview

Create collections

The easiest form is to instantiate directly and use an auto variable to hold the reference.

    auto list2 = new LinkSeq!( int );

In most cases a collection type is use more than once. So it is good to have an alias.

    alias LinkSeq!( char[] ) StrList;
    StrList list1 = new StrList;

Add and remove elements

Each container type has it own set of methods.

    list1.append( "The first item in list1" );
    list2.prepend( 2 );

See the Methods exposed by the collection classes to get a good overview.

Visiting contained elements

The typical, and simplest pattern is

    foreach (key, value; map)
             Stdout.formatln ("Key: {0}, Value: {1}", key, value);

One variation is using only 'value' instead of both 'key' and 'value'. Both of these utilize a slightly more efficient mechanism to visit each contained element (or element pair), as they can avoid heap activity when instantiating an Iterator.

Collections also expose iterators, supporting both foreach and a familiar more()/next() pattern. Each container exposes method(s) to instantiate an appropriate Iterator

    auto it = map.keys();

    // The call to .more returns true if there are more elements,
    // and switches the iterator to the next one if available.
    while(it.more){
          char[] key; 
          auto value = it.get(key);
          Stdout.formatln ("Key: {0}, Value:{1}", key, value);
    }

For associative containers, the iterator's get method has an out parameter for key retrival.

The keys() method is used to also get a set of the key values. For the 'normal' containers, like LinkSeq etc. the elements() method is used to get a conventional iterator. Each container also supports foreach() directly.

Asynchronous mutation during iteration

Iterators are sensitive to container changes whilst they operate, and will throw an exception when this occurs. This it to ensure the client sees a consistent view of the container for the lifetime of the iterator.

Caution with iterators

It is currently not possible to modify the container while iterating over it with foreach or an iterator. To delete certain elements from a container it is necessary to do it in two steps. First find the elements to delete, then do the deletion

// 1. Build a list of elements to delete
auto dellist = new LinkSeq!(int);
foreach( int i; container ){
  if( i >= 3 && i < 7 ){
    // container.remove( i ); /* NOT POSSIBLE */
    dellist.append( i );
  }
}
// 2. Iterate over this deletion list and
// delete the items in the original container.
foreach( int i; dellist ){
    container.remove( i );
}

This may be rectified in the future.

InterleavingIterator

This more complex Iterator shows why iterators are still useful, even if we have the foreach construct.

InterleavingIterators allow you to combine the elements of two different enumerations as if they were one enumeration before they are seen by their consumers. This sometimes allows one to avoid use of a collection object, to temporarily combine two sets of Collection elements() that need to be collected together for common processing.

The elements are revealed (via get()) in a purely interleaved fashion, alternating between the first and second enumerations unless one of them has been exhausted, in which case all remaining elements of the other are revealed until it too is exhausted.

import tango.util.collection.ArrayBag;
import tango.util.collection.iterator.InterleavingIterator;

void testInterleavingIterator(){
    int[] result;

    // Create and fill the containers
    auto l1 = new LinkSeq!(int);
    auto l2 = new ArraySeq!(int);
    for( int i = 0; i < 6; i+=2 ){
        l1.append( i );
        l2.append( i+1 );
    }

    // define the InterleavingIterator
    auto it = new InterleavingIterator!(int)(l1.elements, l2.elements);

    // use the InterleavingIterator to iterate over the container
    while (it.more)
        result ~= it.get;

    // test the result
    assert(result == [0, 1, 2, 3, 4, 5], "testInterleavingIterator");
}

FilteringIterator

FilteringIterators allow you to filter out elements from other enumerations before they are seen by their consumers (i.e. the callers of get).

import tango.util.collection.LinkSeq;
import tango.util.collection.ArrayBag;
import tango.util.collection.iterator.FilteringIterator;

void testFilteringIterator(){
    int[] result;
    alias ArrayBag!(int) IntBag;

    // Create and fill the container
    auto ib = new IntBag;
    for( int i = 0; i < 20; i++ ){
        ib.add( i );
    }

    // define the FilteringIterator with a function literal
    auto it = new FilteringIterator!(int)( ib.elements(), (int i){ 
        return i >= 3 && i < 7; 
    });

    // use the FilteringIterator to iterate over the container
    while (it.more())
        result ~= it.get();

    // test the result
    assert( result == [ 3, 4, 5, 6 ], "testFilteringIterator" );
}

Comparators

Sorted containers like trees, need to compare the elements. The algorithm that is used for this comparison can be user specified. E.g. strings can be sorted in different ways. If you want a bag to have its addIf using your compare method, this can be done like this

    // Create a bag that is not case sensitive
    auto nameSet = new TreeBag!(char[])( null, 
      new class() Comparator!(char[]){
        int compare( char[] first, char[] second ){
            return Ascii.icompare( first, second );
        }
    });

    nameSet.addIf( "Alice" );
    nameSet.addIf( "Bob" );
    nameSet.addIf( "aliCe" ); // will not be added

Screeners

Each container has the possibility to have a predicate as a so called screener. The definition can be found in the Collection class.

alias bool delegate(T) Predicate;

This delegate will be called, each time a new element is to be inserted into the container. If the screener returns false, the container throws an IllegalElementException to the caller.

    // Create a restricted container
    auto ratioSamples = new ArraySeq!(float)( (float v){
        // restrict element to the range 0.0 .. 0.99999..
        return v >= 0.0f && v < 1.0f;
    });

    // try to insert some values
    try{
        ratioSamples.append( 0.0f );      // OK
        ratioSamples.append( 0.5f );      // OK
        ratioSamples.append( 0.99999f );  // OK
        ratioSamples.append( 1.0f );      // Bang
        // will never get here
        assert( false );
    } catch( IllegalElementException e ){
    }

To test if an element is allowed for a certain container, use the bool allows( T ) method, which is part of the Collection class.

Bag and Set

Bags are collections supporting multiple occurrences of elements. Sets do only support unique elements.

In both, the sequence of elements is not defined. An exception is TreeBag, there the elements are ordered in their natural way. Or the user supplies a Comparator object.

Both bags support the addIf() method. So they can be used like sets.

HashSet uses a hash algorithm to find potential duplicates very fast, but this algorithm can use a lot of memory.

Sequence, the ordered storage

Sequences are indexed, sequentially ordered collections. Indices are always in the range 0 .. size() -1. All accesses by index are checked, raising exceptions if the index falls out of range.

The elements() enumeration for all sequences is guaranteed to be traversed (via more()/get()) in sequential order.

  • Seq is an interface for all sequences
  • ArraySeq implementing class, uses an array. Best used when the storage size is not changed too often, or a fast element access per index is needed.
  • LinkSeq implementing class, uses element nodes which are allocated dynamically and linked with each other. This implementation can easy append elements and remove elements from the head (take()). Making random access to elements per index is slow, because it is neccessary to follow the links from the beginning of the storage.
  • CircularSeq is similar to LinkSeq. But this class uses double linked cells. So all operations are equal in operation, nevertheless the are started from the beginning or end of the list.

Map, the associative Storage

Maps maintain keyed elements. Any kind of Object or primitive type may serve as a key for an element. Each key is unique and is associated with one value.

    import tango.store.HashMap;
    // ...
    alias HashMapT!(char[], char[]) StrStrMap;

    auto map = new StrStrMap();
    map.add("key1", "value1");
    char[] key = "key1";
    Stdout.format ("Key: {0}, Value:{1}", key, map.get(key)).newline;

Hash vs Tree

Sets and Maps are available with tree or hash implementation. When should one choose which?

Using a tree means it is necessary to navigate a tree structure to find an element. The number of stages to traverse is around O(log2(n)).

When using a hash, an array is used to store a given number of buckets. A hash function is used to calculate a hash from the object value. A hash is a n-bit integer value, which should be mostly unique for each object. The same content of an object must always calculate the same hash value. This value is used when the bucket to store or retrieve the element is chosen.

In the best case, this algorithm can find the element directly (if the bucket contains exactly one element). That would be access in constant time or O(1). But if the bucket holds many elements, then (depending on the bucket implementation) this can lead to a linear search. In a good hash this should not happen. To ensure the buckets are not too full, the array for them should be chosen big. This means, hash needs much more memory than tree and a good quality hash function for the stored object, to be efficient.

D builtin arrays and hashes vs. ArraySeq and HashMap

D already supports builtin dynamic arrays and hashes. Why should one use a template container for these container types?

There is nothing against the builtin arrays/hashes, use them if they have all you need. Be aware of the additional features the tango container ArraySeq/HashMap has:

  • A screener function to restrict the inserted elements to a configurable set.
  • Iterators that can implement various behaviour.
  • the tango collections are compatible between each other, because they implement common interfaces. So it is theoretical possible to exchange a ArraySeq container with a LinkSeq container, if you only use the methods from View.

In Review

Don't reinvent the wheel, use containers where ever possible.

Choose the appropriate container class for you needs.

Suggestion for this documentation

  • Show the use of procedure

References

Illustrations

source:/trunk/doc/images/collection.png

source:/trunk/doc/images/methods.png

User Comments

Comments