Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

The container classes

Tango provides a number of container classes. In tango they are located in the tango.util.container package, which is home to a variety of templated containers.

Container classes store things, and do so in different ways. For example, the user can choose between a list and an array to store the same information - each storage mechanism has its own pros and cons. The collection classes are templates, which means they need a type parameter specified for the things to be stored.

Types of containers

The following table is an overview of the existing collections. The first column list the kind of storage, the column heading list the kind of implementations:

Array Link Tree Hash
Bag
Set HashSet
Sequence LinkedList, CircularList
Map SortedMap HashMap

Bag

This is the simplest container. Throw everything in. The sequence of elements is not defined. Duplicates allowed.

Set

Same like Bag, but no duplicates. If the implementation is using 'Hash' and the element type is object, the sequence on iterating over the contain may changes in each run. To avoid this, either have a content based hash for the element type, or use sorted container.

Sequence

A container that will store the elements in the sequence they were added.

Map

An associative container. It has a set of keys. Each key is unique and is mapped to a value. Both the key and the value type are template parameters.

Arrays

They can access their members very fast and without the need for any special sequence. But inserting an element sometimes requires the array to be resized. Resizing the array is very slow. The array then needs to be reallocated and all data must be copied over.

Linked lists

These containers increase their size with constant cost. Whether prepending, inserting, or appending, every element is allocated and linked in constant time. The drawback is that finding an element in the middle of a linked list requires iteration over the list. For big data collections, this is very slow.

LinkedList vs. CircularList: The LinkedList is a singly linked list. That means, that search can only be made in the forward direction. So every method that wants to access the elements at the end, has to iterate over the whole list. On the other hand, the CircularList is a double linked List, that can iterate in both directions. This come with a bit more memory consumption and a bit more overhead for pointer controlling, but some operations can be faster, because they can access the end of the list directly.

Trees

Containers that use a binary tree to access their members. The cost for insertion and the retrieval is low.

Hashes

Hashes are the fastest container in general, but they need much memory. Also the elements need to support a good toHash() implementation.

Create containers

The easiest form is to instantiate directly and use an auto variable to hold the reference.

    auto list = new LinkedList!( int );
    auto map  = new HashMap!( char[], char[] );

In most cases a container type is used more than once. So it is good practice to have an alias.

    alias CircularList!( char[] ) StrList;
    auto list2 = new StrList;

Add and remove elements

Each container type has its own set of methods. All containers support add.

    list.add( 2 );
    map.add( "first", "value one" );

Some containers have specialized methods to add values at a certain index like addAt, append or prepend

    list.prepend( 4 );  // now '4, 2'
    list.addAt( 1, 3 ); // now '4, 3, 2'
    list.append( 1 );   // now '4, 3, 2, 1'
    int oldValue = list.removeHead(); // now '3, 2, 1' and oldValue == 4

There are more methods for adding, removing, searching, retrival, ... See the API doc of the container.

Visiting contained elements

The typical, and simplest pattern is

    foreach (value; list)
             Stdout.formatln ("Value: {}", value);
    foreach (key, value; map)
             Stdout.formatln ("Key: {}, Value: {}", key, value);

For the map, one variation is also using only 'value' instead of both 'key' and 'value'. Both of these utilize a slightly more efficient mechanism to visit each contained element (or element pair), as they can avoid heap activity when instantiating an Iterator.

Containers also expose iterators, supporting both foreach and a familiar bool next(ref V v) pattern.

    V value;
    auto it = list.iterator;
    while(it.next(value))
             Stdout.formatln ("Value: {}", value);

Note: The iterators also support a remove method.

Each container exposes method(s) to instantiate an appropriate Iterator

    auto it = map.iterator();

    char[] key, value;
    while(it.next(key, value){
          Stdout.formatln ("Key: {}, Value:{}", key, value);
    }

Asynchronous mutation during iteration

This section is not completed yet and is a copy of the collection doc

Iterators are sensitive to container changes whilst they operate, and will throw an exception when this occurs. This it to ensure the client sees a consistent view of the container for the lifetime of the iterator.

InterleavingIterator

This section is not completed yet and is a copy of the collection doc

This more complex Iterator shows why iterators are still useful, even if we have the foreach construct.

InterleavingIterators allow you to combine the elements of two different enumerations as if they were one enumeration before they are seen by their consumers. This sometimes allows one to avoid use of a collection object, to temporarily combine two sets of Collection elements() that need to be collected together for common processing.

The elements are revealed (via get()) in a purely interleaved fashion, alternating between the first and second enumerations unless one of them has been exhausted, in which case all remaining elements of the other are revealed until it too is exhausted.

import tango.util.collection.ArrayBag;
import tango.util.collection.iterator.InterleavingIterator;

void testInterleavingIterator(){
    int[] result;

    // Create and fill the containers
    auto l1 = new LinkSeq!(int);
    auto l2 = new ArraySeq!(int);
    for( int i = 0; i < 6; i+=2 ){
        l1.append( i );
        l2.append( i+1 );
    }

    // define the InterleavingIterator
    auto it = new InterleavingIterator!(int)(l1.elements, l2.elements);

    // use the InterleavingIterator to iterate over the container
    while (it.more)
        result ~= it.get;

    // test the result
    assert(result == [0, 1, 2, 3, 4, 5], "testInterleavingIterator");
}

FilteringIterator

This section is not completed yet and is a copy of the collection doc

FilteringIterators allow you to filter out elements from other enumerations before they are seen by their consumers (i.e. the callers of get).

import tango.util.collection.LinkSeq;
import tango.util.collection.ArrayBag;
import tango.util.collection.iterator.FilteringIterator;

void testFilteringIterator(){
    int[] result;
    alias ArrayBag!(int) IntBag;

    // Create and fill the container
    auto ib = new IntBag;
    for( int i = 0; i < 20; i++ ){
        ib.add( i );
    }

    // define the FilteringIterator with a function literal
    auto it = new FilteringIterator!(int)( ib.elements(), (int i){ 
        return i >= 3 && i < 7; 
    });

    // use the FilteringIterator to iterate over the container
    while (it.more())
        result ~= it.get();

    // test the result
    assert( result == [ 3, 4, 5, 6 ], "testFilteringIterator" );
}

Comparators

This section is not completed yet and is a copy of the collection doc

Sorted containers like trees, need to compare the elements. The algorithm that is used for this comparison can be user specified. E.g. strings can be sorted in different ways. If you want a bag to have its addIf using your compare method, this can be done like this

    // Create a bag that is not case sensitive
    auto nameSet = new TreeBag!(char[])( null, 
      new class() Comparator!(char[]){
        int compare( char[] first, char[] second ){
            return Ascii.icompare( first, second );
        }
    });

    nameSet.addIf( "Alice" );
    nameSet.addIf( "Bob" );
    nameSet.addIf( "aliCe" ); // will not be added

Screeners

This section is not completed yet and is a copy of the collection doc

Each container has the possibility to have a predicate as a so called screener. The definition can be found in the Collection class.

alias bool delegate(T) Predicate;

This delegate will be called, each time a new element is to be inserted into the container. If the screener returns false, the container throws an IllegalElementException to the caller.

    // Create a restricted container
    auto ratioSamples = new ArraySeq!(float)( (float v){
        // restrict element to the range 0.0 .. 0.99999..
        return v >= 0.0f && v < 1.0f;
    });

    // try to insert some values
    try{
        ratioSamples.append( 0.0f );      // OK
        ratioSamples.append( 0.5f );      // OK
        ratioSamples.append( 0.99999f );  // OK
        ratioSamples.append( 1.0f );      // Bang
        // will never get here
        assert( false );
    } catch( IllegalElementException e ){
    }

To test if an element is allowed for a certain container, use the bool allows( T ) method, which is part of the Collection class.

Bag and Set

This section is not completed yet and is a copy of the collection doc

Bags are collections supporting multiple occurrences of elements. Sets do only support unique elements.

In both, the sequence of elements is not defined. An exception is TreeBag, there the elements are ordered in their natural way. Or the user supplies a Comparator object.

Both bags support the addIf() method. So they can be used like sets.

HashSet uses a hash algorithm to find potential duplicates very fast, but this algorithm can use a lot of memory.

Sequence, the ordered storage

This section is not completed yet and is a copy of the collection doc
Sequences are indexed, sequentially ordered collections. Indices are always in the range 0 .. size() -1. All accesses by index are checked, raising exceptions if the index falls out of range.

The elements() enumeration for all sequences is guaranteed to be traversed (via more()/get()) in sequential order.

  • Seq is an interface for all sequences
  • ArraySeq implementing class, uses an array. Best used when the storage size is not changed too often, or a fast element access per index is needed.
  • LinkSeq implementing class, uses element nodes which are allocated dynamically and linked with each other. This implementation can easy append elements and remove elements from the head (take()). Making random access to elements per index is slow, because it is neccessary to follow the links from the beginning of the storage.
  • CircularSeq is similar to LinkSeq. But this class uses double linked cells. So all operations are equal in operation, nevertheless the are started from the beginning or end of the list.

Map, the associative Storage

This section is not completed yet and is a copy of the collection doc
Maps maintain keyed elements. Any kind of Object or primitive type may serve as a key for an element. Each key is unique and is associated with one value.

    import tango.store.HashMap;
    // ...
    alias HashMapT!(char[], char[]) StrStrMap;

    auto map = new StrStrMap();
    map.add("key1", "value1");
    char[] key = "key1";
    Stdout.format ("Key: {0}, Value:{1}", key, map.get(key)).newline;

Hash vs Tree

This section is not completed yet and is a copy of the collection doc
Sets and Maps are available with tree or hash implementation. When should one choose which?

Using a tree means it is necessary to navigate a tree structure to find an element. The number of stages to traverse is around O(log2(n)).

When using a hash, an array is used to store a given number of buckets. A hash function is used to calculate a hash from the object value. A hash is a n-bit integer value, which should be mostly unique for each object. The same content of an object must always calculate the same hash value. This value is used when the bucket to store or retrieve the element is chosen.

In the best case, this algorithm can find the element directly (if the bucket contains exactly one element). That would be access in constant time or O(1). But if the bucket holds many elements, then (depending on the bucket implementation) this can lead to a linear search. In a good hash this should not happen. To ensure the buckets are not too full, the array for them should be chosen big. This means, hash needs much more memory than tree and a good quality hash function for the stored object, to be efficient.

D builtin arrays and hashes vs. ArraySeq and HashMap

This section is not completed yet and is a copy of the collection doc

D already supports builtin dynamic arrays and hashes. Why should one use a template container for these container types?

There is nothing against the builtin arrays/hashes, use them if they have all you need. Be aware of the additional features the tango container ArraySeq/HashMap has:

  • A screener function to restrict the inserted elements to a configurable set.
  • Iterators that can implement various behaviour.
  • the tango collections are compatible between each other, because they implement common interfaces. So it is theoretical possible to exchange a ArraySeq container with a LinkSeq container, if you only use the methods from View.

In Review

Don't reinvent the wheel, use containers where ever possible.

Choose the appropriate container class for you needs.

Suggestion for this documentation

  • Show the use of procedure

References

Illustrations

User Comments

Comments