Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Python Challenge -- Level 3 -- Deterministic Automaton

Code

import tango.io.Stdout;
import tango.text.Util;
import tango.text.convert.Integer;

void main () {
        char[] text = "";

        assert (text.length >= 9);

        int getcase (char letter) {
                if (letter >= 'a' && letter <= 'z')
                        return 3;
                else if (letter >= 'A' && letter <= 'Z')
                        return 2;
                return 0;
        }

        //            a A A A a A A A a
        int mask = 0b111010101110101011;

        foreach (char[] line; lines(text))
        {
                if (line.length >= 9)
                {
                        int current;
                        char lastlet;
                        foreach (char letter; line[0..8])
                                current <<= 2, current |= getcase(letter);
                        for (size_t i=8; lastlet = line[i-4], i<line.length; i++)
                        {
                                current <<= 2, current |= getcase(line[i]);
                                if ((current & 0x3ffff) == mask)
                                        Stdout (""~toUtf8(mask)
                                                ~" "~toUtf8(current & 0x3ffff)
                                                ~" "~lastlet
                                                ~" "~line).newline;
                                        // do not break, maybe there's more than one in a line :)
                        }
                }
        }
        Stdout.flush();
}

Explanation

First take a look at Regular Expression Solution

We represent each character by two bits:

  • 11 - lower case
  • 10 - upper case
  • 00 - none of above
int mask = 0b111010101110101011;

Since input is splitted into lines, we won't bother with the ones that goes through multiple lines e.g:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxXX
XxXXXxx...

So we will process input line by line:

foreach (char[] line; lines(text))

Two additional variables:

int current;
char lastlet;

current will hold actually computed hash (bitmask), and lastlet will hold latest letter in the middle.

foreach (char letter; line[0..8])
        current <<= 2, current |= getcase(letter);

Precompute hash for first 8 characters, by simply shifting our variable by two bits, and adding the proper bits.

Later process the rest of the line, remembering current middle letter. Because in our hash we represent every letter as two bits we need to binary-and only the last 18 bits, that's what 0x3ffff is there for.