License:
BSD style: see license.txt

Version:
Initial release: December 2005

author:
Kris



Text is a class for storing and manipulating Unicode characters.

Text maintains a current "selection", controlled via the mark(), select() and selectPrior() methods. Each of append(), prepend(), replace() and remove() operate with respect to the selection. The select() methods operate with respect to the current selection also, providing a means of iterating across matched patterns. To set a selection across the entire content, use the mark() method with no arguments.

Indexes and lengths of content always count code units, not code points. This is similar to traditional ascii string handling, yet indexing is rarely used in practice due to the selection idiom: substring indexing is generally implied as opposed to manipulated directly. This allows for a more streamlined model with regard to surrogates.

Strings support a range of functionality, from insert and removal to utf encoding and decoding. There is also an immutable subset called TextView, intended to simplify life in a multi-threaded environment. However, TextView must expose the raw content as needed and thus immutability depends to an extent upon so-called "honour" of a callee. D does not enable immutability enforcement at this time, but this class will be modified to support such a feature when it arrives - via the slice() method.

The class is templated for use with char[], wchar[], and dchar[], and should migrate across encodings seamlessly. In particular, all functions in tango.text.Util are compatible with Text content in any of the supported encodings. In future, this class will become the principal gateway to the extensive ICU unicode library.

Note that several common text operations can be constructed through combining tango.text.Text with tango.text.Util e.g. lines of text can be processed thusly:
        auto source = new Text!(char)("one\ntwo\nthree");

        foreach (line; Util.lines(source.slice))
                 // do something with line
Substituting patterns within text can be implemented simply and rather efficiently:
        auto dst = new Text!(char);

        foreach (element; Util.patterns ("all cows eat grass", "eat", "chew"))
                 dst.append (element);
Speaking a bit like Yoda might be accomplished as follows:
        auto dst = new Text!(char);

        foreach (element; Util.delims ("all cows eat grass", " "))
                 dst.prepend (element);
Below is an overview of the API and class hierarchy:
        class Text(T) : TextView!(T)
        {
                // set or reset the content
                Text set (T[] chars, bool mutable=true);
                Text set (TextView other, bool mutable=true);

                // retreive currently selected text
                T[] selection ();

                // get the index and length of the current selection
                Span selectionSpan ();

                // mark a selection
                Text select (int start=0, int length=int.max);

                // move the selection around
                bool select (T c);
                bool select (T[] pattern);
                bool select (TextView other);
                bool selectPrior (T c);
                bool selectPrior (T[] pattern);
                bool selectPrior (TextView other);

                // append behind current selection
                Text append (TextView other);
                Text append (T[] text);
                Text append (T chr, int count=1);
                Text append (int value, options);
                Text append (long value, options);
                Text append (double value, options);

                // transcode behind current selection
                Text encode (char[]);
                Text encode (wchar[]);
                Text encode (dchar[]);

                // insert before current selection
                Text prepend (T[] text);
                Text prepend (TextView other);
                Text prepend (T chr, int count=1);

                // replace current selection
                Text replace (T chr);
                Text replace (T[] text);
                Text replace (TextView other);

                // remove current selection
                Text remove ();

                // clear content
                Text clear ();

                // trim leading and trailing whitespace
                Text trim ();

                // trim leading and trailing chr instances
                Text strip (T chr);

                // truncate at point, or current selection
                Text truncate (int point = int.max);

                // reserve some space for inserts/additions
                Text reserve (int extra);
        }

        class TextView(T) : UniText
        {
                // hash content
                hash_t toHash ();

                // return length of content
                uint length ();

                // compare content
                bool equals  (T[] text);
                bool equals  (TextView other);
                bool ends    (T[] text);
                bool ends    (TextView other);
                bool starts  (T[] text);
                bool starts  (TextView other);
                int compare  (T[] text);
                int compare  (TextView other);
                int opEquals (Object other);
                int opCmp    (Object other);

                // copy content
                T[] copy (T[] dst);

                // return content
                T[] slice ();

                // return data type
                typeinfo encoding ();

                // replace the comparison algorithm
                Comparator comparator (Comparator other);
        }

        class UniText
        {
                // convert content
                abstract char[]  toString  (char[]  dst = null);
                abstract wchar[] toString16 (wchar[] dst = null);
                abstract dchar[] toString32 (dchar[] dst = null);
        }


$(DDOC_MODULE_MEMBERS
  • class Text (T): TextView!(T);
  • $(DDOC_DECL_DD The mutable Text class actually implements the full API, whereas the superclasses are purely abstract (could be interfaces instead).

  • struct Span ;
  • Selection span

  • uint begin ;
  • index of selection point

  • uint length ;
  • length of selection

  • this(uint space = 0);
  • Create an empty Text with the specified available space

    Note:
    A character like 'a' will be implicitly converted to uint and thus will be accepted for this constructor, making it appear like you can initialize a Text instance with a single character, something which is not supported.



  • this(T[] content, bool copy = true);
  • Create a Text upon the provided content. If said content is immutable (read-only) then you might consider setting the 'copy' parameter to false. Doing so will avoid allocating heap-space for the content until it is modified via Text methods. This can be useful when wrapping an array "temporarily" with a stack-based Text

  • this(TextViewT other, bool copy = true);
  • Create a Text via the content of another. If said content is immutable (read-only) then you might consider setting the 'copy' parameter to false. Doing so will avoid allocating heap-space for the content until it is modified via Text methods. This can be useful when wrapping an array temporarily with a stack-based Text

  • Text set (T[] chars, bool copy = true);
  • Set the content to the provided array. Parameter 'copy' specifies whether the given array is likely to change. If not, the array is aliased until such time it is altered via this class. This can be useful when wrapping an array "temporarily" with a stack-based Text

  • Text set (TextViewT other, bool copy = true);
  • Replace the content of this Text. If the new content is immutable (read-only) then you might consider setting the 'copy' parameter to false. Doing so will avoid allocating heap-space for the content until it is modified via one of these methods. This can be useful when wrapping an array "temporarily" with a stack-based Text

  • Text select (int start = 0, int length = (int).max);
  • Explicitly set the current selection

  • T[] selection ();
  • Return the currently selected content

  • Span selectionSpan ();
  • Return the index and length of the current selection

  • bool select (T c);
  • Find and select the next occurrence of a BMP code point in a string. Returns true if found, false otherwise

  • bool select (TextViewT other);
  • Find and select the next substring occurrence. Returns true if found, false otherwise

  • bool select (T[] chars);
  • Find and select the next substring occurrence. Returns true if found, false otherwise

  • bool selectPrior (T c);
  • Find and select a prior occurrence of a BMP code point in a string. Returns true if found, false otherwise

  • bool selectPrior (TextViewT other);
  • Find and select a prior substring occurrence. Returns true if found, false otherwise

  • bool selectPrior (T[] chars);
  • Find and select a prior substring occurrence. Returns true if found, false otherwise

  • Text append (TextViewT other);
  • Append text to this Text

  • Text append (T[] chars);
  • Append text to this Text

  • Text append (T chr, int count = 1);
  • Append a count of characters to this Text

  • Text append (int v, T[] fmt = null);
  • Append an integer to this Text

  • Text append (long v, T[] fmt = null);
  • Append a long to this Text

  • Text append (double v, uint decimals = 2, int e = 10);
  • Append a double to this Text

  • Text prepend (T chr, int count = 1);
  • Insert characters into this Text

  • Text prepend (T[] other);
  • Insert text into this Text

  • Text prepend (TextViewT other);
  • Insert another Text into this Text

  • Text encode (char[] s);
    Text encode (wchar[] s);
    Text encode (dchar[] s);
    Text encode (Object o);
  • Append encoded text at the current selection point. The text is converted as necessary to the appropritate utf encoding.

  • Text replace (T chr);
  • Replace a section of this Text with the specified character

  • Text replace (T[] chars);
  • Replace a section of this Text with the specified array

  • Text replace (TextViewT other);
  • Replace a section of this Text with another

  • Text remove ();
  • Remove the selection from this Text and reset the selection to zero length

  • Text remove (int start, int count);
  • Remove the selection from this Text

  • Text truncate (int index = (int).max);
  • Truncate this string at an optional index. Default behaviour is to truncate at the current append point. Current selection is moved to the truncation point, with length 0

  • Text clear ();
  • Clear the string content

  • Text trim ();
  • Remove leading and trailing whitespace from this Text, and reset the selection to the trimmed content

  • Text strip (T matches);
  • Remove leading and trailing matches from this Text, and reset the selection to the stripped content

  • Text reserve (uint extra);
  • Reserve some extra room

  • TypeInfo encoding ();
  • Get the encoding type

  • Comparator comparator (Comparator other);
  • Set the comparator delegate. Where other is null, we behave as a getter only

  • hash_t toHash ();
  • Hash this Text

  • uint length ();
  • Return the length of the valid content

  • bool equals (TextViewT other);
  • Is this Text equal to another?

  • bool equals (T[] other);
  • Is this Text equal to the provided text?

  • bool ends (TextViewT other);
  • Does this Text end with another?

  • bool ends (T[] chars);
  • Does this Text end with the specified string?

  • bool starts (TextViewT other);
  • Does this Text start with another?

  • bool starts (T[] chars);
  • Does this Text start with the specified string?

  • int compare (TextViewT other);
  • Compare this Text start with another. Returns 0 if the content matches, less than zero if this Text is "less" than the other, or greater than zero where this Text is "bigger".

  • int compare (T[] chars);
  • Compare this Text start with an array. Returns 0 if the content matches, less than zero if this Text is "less" than the other, or greater than zero where this Text is "bigger".

  • T[] copy (T[] dst);
  • Return content from this Text

    A slice of dst is returned, representing a copy of the content. The slice is clipped to the minimum of either the length of the provided array, or the length of the content minus the stipulated start point



  • T[] slice ();
  • Return an alias to the content of this TextView. Note that you are bound by honour to leave this content wholly unmolested. D surely needs some way to enforce immutability upon array references

  • char[] toString (char[] dst = null);
    wchar[] toString16 (wchar[] dst = null);
    dchar[] toString32 (dchar[] dst = null);
  • Convert to the UniText types. The optional argument dst will be resized as required to house the conversion. To minimize heap allocation during subsequent conversions, apply the following pattern:
                    _Text  string;
    
    wchar[] buffer; wchar[] result = string.utf16 (buffer);

    if (result.length > buffer.length) buffer = result;
                    You can also provide a buffer from the stack, but the output
                    will be moved to the heap if said buffer is not large enough
    
            
    




  • int opCmp (Object o);
  • Compare this Text to another. We compare against other Strings only. Literals and other objects are not supported

  • int opEquals (Object o);
    int opEquals (T[] s);
  • Is this Text equal to the text of something else?

  • void pinIndex (ref int x);
  • Pin the given index to a valid position.

  • void pinIndices (ref int start, ref int length);
  • Pin the given index and length to a valid position.

  • int simpleComparator (T[] a, T[] b);
  • Compare two arrays. Returns 0 if the content matches, less than zero if A is "less" than B, or greater than zero where A is "bigger". Where the substrings match, the shorter is considered "less".

  • void expand (uint index, uint count);
  • Make room available to insert or append something

  • Text set (T chr, uint start, uint count);
  • Replace a section of this Text with the specified character

  • void realloc (uint count = 0);
  • Allocate memory due to a change in the content. We handle the distinction between mutable and immutable here.

  • Text append (T* chars, uint count);
  • Internal method to support Text appending

  • class TextView (T): UniText;
  • Immutable string

  • uint length ();
  • Return the length of the valid content

  • bool equals (TextView other);
  • Is this Text equal to another?

  • bool equals (T[] other);
  • Is this Text equal to the the provided text?

  • bool ends (TextView other);
  • Does this Text end with another?

  • bool ends (T[] chars);
  • Does this Text end with the specified string?

  • bool starts (TextView other);
  • Does this Text start with another?

  • bool starts (T[] chars);
  • Does this Text start with the specified string?

  • int compare (TextView other);
  • Compare this Text start with another. Returns 0 if the content matches, less than zero if this Text is "less" than the other, or greater than zero where this Text is "bigger".

  • int compare (T[] chars);
  • Compare this Text start with an array. Returns 0 if the content matches, less than zero if this Text is "less" than the other, or greater than zero where this Text is "bigger".

  • T[] copy (T[] dst);
  • Return content from this Text. A slice of dst is returned, representing a copy of the content. The slice is clipped to the minimum of either the length of the provided array, or the length of the content minus the stipulated start point

  • int opCmp (Object o);
  • Compare this Text to another

  • int opEquals (Object other);
  • Is this Text equal to another?

  • int opEquals (T[] other);
  • Is this Text equal to another?

  • TypeInfo encoding ();
  • Get the encoding type

  • Comparator comparator (Comparator other);
  • Set the comparator delegate

  • hash_t toHash ();
  • Hash this Text

  • T[] slice ();
  • Return an alias to the content of this TextView. Note that you are bound by honour to leave this content wholly unmolested. D surely needs some way to enforce immutability upon array references

  • class UniText ;
  • A string abstraction that converts to anything

    Copyright (c) 2005 Kris Bell. All rights reserved :: page rendered by CandyDoc