Lexical
The lexical phase splits the input source text into a stream of tokens. This phase finds and rejects illegal characters and malformed tokens (such as a float literal of "4.5x").
MiniD source text consists of white space, end of lines, comments, and tokens, all followed by the end of file marker.
MiniD source text can be in ASCII (not extended ASCII, normal 7-bit ASCII) or any Unicode format (UTF-8, UTF-16, and UTF-32, and both little- and big-endian versions).
Shebang
MiniD source files are allowed to begin their first line with what's called a 'shebang', which is a pound sign immediately followed by an exclamation point: #!. This is commonly used on Posix systems to allow script files to be associated with a host program which runs them. You can use MDCL as the script host for MiniD scripts.
The shebang must be at the very beginning of the file -- the first and second characters (after any BOMs). All text up to and including the end of the shebang line will be ignored. It counts as a line, but is ignored by the compiler as if it were a comment.
Whitespace
WhiteSpace:
Space {Space}
Space:
' '
'\t'
'\v'
'\u000C'
EndOfLine
Comment
EndOfLine:
'\r'
'\n'
'\r\n'
EndOfFile
Whitespace is generally ignored by MiniD. There is one exception. The EndOfLine element is one of the possible statement terminators (see the [LanguageSpec2/Statements Statements] section for information). However, an EndOfLine is not always interpreted as such, and may be ignored, such as if it comes in the middle of an expression or statement.
End of File
EndOfFile: physical end of file '\0'
The MiniD lexer will stop lexing when it reaches the actual end of the file, or when it hits a null character.
Comments
Comment:
'/*' {Character} '*/'
'//' {Character} EndOfLine
NestedComment
NestedComment:
'/+' {Character | NestedComment} '+/'
There are three types of comments in MiniD: C-style block comments, C++-style line comments, and D-style nesting comments. All three function the same way as in D. Nesting comments are particularly useful for commenting out blocks of code, where you don't want to have embedded comments affect the commenting. They can be nested arbitrarily deep.
Tokens
Token:
Identifier
Keyword
CharLiteral
StringLiteral
IntLiteral
FloatLiteral
'+'
'+='
'++'
'-'
'-='
'--'
'~'
'~='
'*'
'*='
'/'
'/='
'%'
'%='
'<'
'<='
'<<'
'<<='
'>'
'>='
'>>'
'>>='
'>>>'
'>>>='
'&'
'&='
'&&'
'|'
'|='
'||'
'^'
'^='
'='
'=='
'?='
'.'
'..'
'!'
'!='
'('
')'
'['
']'
'{'
'}'
':'
','
';'
'#'
'\\'
'->'
'$'
EOF
Identifiers
Identifier:
IdentifierStart {IdentifierChar}
IdentifierStart:
_
Letter
IdentifierChar:
IdentifierStart
DecimalDigit
Identifiers starting with two underscores ("__") are reserved and cannot be used. In fact, the lexical pass will fail if it comes across an identifier that starts with two underscores.
Keywords
Keyword: 'as' 'break' 'case' 'class' 'catch' 'continue' 'coroutine' 'default' 'do' 'else' 'false' 'finally' 'for' 'foreach' 'function' 'global' 'if' 'import' 'in' 'is' 'local' 'module' 'null' 'return' 'super' 'switch' 'this' 'throw' 'true' 'try' 'vararg' 'while' 'with' 'yield'
Changes from MiniD 1
The 'class' keyword has been dropped, and the 'object' keyword has been added.
Character Literals
CharLiteral: "'" (Character | EscapeSequence) "'"
These allow you to specify a single character instead of a whole string. These are treated as their own distinct type in MiniD.
String Literals
StringLiteral:
RegularString
WysiwygString
AltWysiwygString
RegularString:
'"' {Character | EscapeSequence | EndOfLine} '"'
EscapeSequence:
'\''
'\"'
'\\'
'\a'
'\b'
'\f'
'\n'
'\r'
'\t'
'\v'
'\x' HexDigit HexDigit
'\u' HexDigit HexDigit HexDigit HexDigit
'\U' HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit
'\ ' DecimalDigit [DecimalDigit [DecimalDigit]]
WysiwygString:
'@"' {Character | EndOfLine | '""'} '"'
AltWysiwygString:
'`' {Character | EndOfLine | '``'} '`'
WYSIWYG string literals are allowed to contain doubled-up versions of their open and close quotes, in order to embed those characters within the string. For example:
@"He said, ""come here!""" // contains "He said, \"come here!\"" `This is what's known as ``something'.` // contains "This is what's known as `something\'."
Changes from MiniD 1
The ability to embed quotes in WYSIWYG strings, as explained right above.
Integer Literals
IntLiteral:
Decimal
Binary
Octal
Hexadecimal
Decimal:
DecimalDigit {DecimalDigit | '_'}
DecimalDigit:
'0'
'1'
'2'
'3'
'4'
'5'
'6'
'7'
'8'
'9'
Binary:
'0' ('b' | 'B') (BinaryDigit | '_') {BinaryDigit | '_'}
BinaryDigit:
'0'
'1'
Octal:
'0' ('c' | 'C') (OctalDigit | '_') {OctalDigit | '_'}
OctalDigit:
'0'
'1'
'2'
'3'
'4'
'5'
'6'
'7'
Hexadecimal:
'0' ('x' | 'X') (HexDigit | '_') {HexDigit | '_'}
HexDigit:
'0'
'1'
'2'
'3'
'4'
'5'
'6'
'7'
'8'
'9'
'A'
'a'
'B'
'b'
'C'
'c'
'D'
'd'
'E'
'e'
'F'
'f'
Similar to D (including allowing underscores in integer literals), but the main difference from D is the octal integer literals. Instead of starting with just 0, octal literals start with 0c to go along with 0x and 0b. Though to tell you the truth, I've never seen octal used.
Floating-Point Literals
FloatLiteral:
[DecimalDigit {DecimalDigit | '_'}] '.' (DecimalDigit | '_') {DecimalDigit | '_'} [Exponent]
DecimalDigit {DecimalDigit | '_'} [Exponent]
Exponent:
('e' | 'E')['+' | '-'] (DecimalDigit | '_') {DecimalDigit | '_'}
Just like in D, but there are no hex float literals. They wouldn't be too useful in a scripting language. There are also no imaginary numbers.
