Compilation
An efficient implementation of any language requires compiling it from a textual form into some kind of easily (and quickly) executable form. MiniD is no different.
Note that this part of the spec is not necessarily a binding contract for all implementations of MiniD. Other compilers may compile the source in a different way, using a single-pass compiler, adding layers of optimization, or even compiling it into a completely different format. There is also nothing precluding MiniD from being a natively compiled language.
Phases of Compilation
The MiniD compiler conceptually has several phases of compilation, but in the interest of speed and lower memory consumption, some of them have been combined.
Here are the phases:
- Lexical Analysis - The source code is read from an input stream and segmented up into tokens. Illegal characters and badly-formatted tokens (such as a float literal reading "4.5n6") are found and rejected in this phase.
- Syntactic Analysis - The token stream is parsed to form syntax trees. This phase determines the overall structure of the program and ensures that consecutive tokens make sense and are not just gibberish (like "local local x, . . . 5").
- Semantic Analysis - The syntax tree of the program is checked for consistency and validity. This checks references to local variables and ensures proper use of some constructs.
- Code Generation - The semantically-analyzed tree is again traversed to generate a data structure that contains bytecode which can be run by the interpreter.
These are the conceptual phases. In the reference implementation, the compiler combines phases 1 and 2 into a single "parsing" pass, and phases 3 and 4 into a single "codegen" pass. Phases 1 and 2 are also run at the same time because the lexical analysis is partially dependent upon the syntactic analysis, at least as far as whether or not newlines are significant (i.e. whether they end a statement or are just whitespace). There are technically other ways of implementing this but this method was also chosen to eliminate the need to allocate memory for every token in the source. Phases 3 and 4 are combined because the semantic analysis that can be performed at compile time is so minute with a dynamic language such as MiniD that a separate semantic phase would do almost nothing, so it's just been subsumed into the code generation phase for simplicity.
The compiler actually allows you to "intercept" the compilation between the parsing and codegen phases, giving you access to the abstract syntax tree of the code. This can be used to do analysis on the code for things like lint tools or integration into IDEs. The AST can also be manipulated, or even a new AST created entirely from scratch. Finally the AST can be codegen'ed, outputting the aforementioned bytecode.
Once the bytecode has been generated, it is either run directly from memory or saved to a module file for later use. For information on how the bytecode is executed, see Execution.
