% Description of the software: Summary of the JAVA classes you implemented; for instance, for symbol % table management, type checking, code generation, error handling, etc. In your description, rely % on the concepts and terminology you learned during the course, such as synthesised and inherited % attributes, tree listeners and visitors. The compiler chain is written in Java mostly, with two preambles (\emph{memlib} and \emph{stdlib}) for compiled programs written in ILOC. The following sections describe the Java classes and the ILOC preamble. \section{Toolchain helper} The toolchain helper \emph{pp.s1184725.boppi.ToolChain} contains various helper methods for compiling and executing programs. Notably, a \verb|Logger| object is required for nearly all methods. The stages of the compiler try to walk through the source code best-effort and report warnings and errors via this logger.\\ Moreover, the helper contains a method to print the abstract syntax tree (\emph{AST}) of a Boppi program as a graphviz graph. The AST can be produced at any point in the compilation process. After the checking phase and the generating phase the AST will be annotated with types, variables and registers used.\\ The helper also provides a method to modify a \verb|Logger| object to collect a list of errors rather than printing them to the standard output. This can be useful for collecting problems and displaying them in a window or for test automation.\\ \section{Checker} The correctness checker \emph{pp.s1184725.boppi.BoppiChecker} performs type checking, binding identifiers to variables, checking constants are assigned once and checking variables are assigned before being used. This is done on a bare parse tree of a Boppi program.\\ The checker is implemented as a tree visitor, since it allows to change state between visiting different children of a node. This is advantageous for keeping, for example, the if-then expression concise. With a visitor, a scope can be opened between the test and the conditional code (see \cref{conditionals}) while using a single, action-less ANTLR rule. With a listener this would require either an action in the ANTLR rule or a sub-rule for opening a scope.\\ The only inherited attributes during checking are the booleans \verb|inLhs| and \verb|inType|. These are implemented as local variables rather than rule attributes. \verb|inLhs| tracks whether a variable is being assigned or is used in an expression. This information is used to decide at compile time whether a constant is assigned a value twice and whether a variable is used before being initialized. \verb|inType| tracks whether a variable is used in a type-level expression, in which it may be used regardless of whether it is initialized.\\ The synthesised attributes during checking are the type of a node (\verb|Annotations::types|) and, when applicable, the variable belonging to an identifier (\verb|Annotations::variables|) and the local variables of a function (\verb|Annotations::function|). The latter are only used in the generating phase.\\ The checker tries to check a whole program best-effort. When a problem is encountered, e.g. an illegal redefinition of a variable, the problem is reported to the logger and the expression is ignored if possible. When ignoring is not an option, e.g. using an undefined variable in an expression, the type is set to \verb|void|, which may lead to a chain of errors further on.\\ All errors and warnings are reported to a \verb|Logger| that is provided to the checker. \section{Generator} The machine code generator \emph{pp.s1184725.boppi.BoppiGenerator} builds an ILOC program from a checked and annotated parse tree. Like the checker, it is implemented as a tree visitor. This gives fine-grained control over the order in which instructions are generated and may lead to fewer jumps and registers. The generator only publicly exposes a static method for generating a program, because of the statefulness of a generator object.\\ The generator passes the result register as a synthesised attribute. The only other attributes are global and include the \verb|Annotation|s, \verb|RegisterPool| and the \verb|Program| produced so far.\\ While building a program, the generator reserves and uses registers drawn from a \verb|RegisterPool|. For example, in a sum, first the left operand is evaluated, then the result register is blocked, then the right operand is visited, then the left result register is freed and finally the addition is generated using both results. This is illustrated in \cref{generator-sum}.\\ \begin{figure} \caption{Java code for generating an addition expression and code example of 32+10.} \label{generator-sum} \begin{subfigure}{0.7\textwidth} \begin{minted}{java} @Override public Reg visitInfix2(Infix2Context ctx) { Reg lhs = visit(ctx.lhs); Reg rhs = regPool.blockReg(lhs, () -> visit(ctx.rhs)); emit(ctx.getChild(1).getText(), ops.get(ctx.op.getType()), lhs, rhs, lhs); return lhs; } \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.2\textwidth} \begin{minted}{iloc} loadI 32 => r loadI 10 => g add r, g => r \end{minted} \end{subfigure} \end{figure} The generator has a number of helper methods to generate calls to \emph{memlib} functions. These methods take a couple of registers and produce a sequence of ILOC instructions. Moreover, it has a few helper methods to increment and decrement AR references, because of the chosen approach to closures, see (\cref{problem-closures}).\\ The generator has a few scenarios for which it produces \verb|haltI| instructions with the appropriate \verb|ERROR_x| value. They may either be generated due to forcing an incorrect program to compile or due to a runtime error. Currently negative array sizes and array out-of-bounds errors are the only runtime errors that lead to a \verb|haltI| instruction.\\ Like the type checker, the generator logs errors to the provided \verb|Logger| rather than throwing exceptions.\\ Lastly, the generator prepends a program with \emph{memlib} and \emph{stdlib} to have access to basic functions on which the generated code relies. \section{Symbol table} The symbol table \emph{pp.s1184725.boppi.symboltable.CachingSymbolTable} keeps track of existing symbols (variables) while the checker traverses a Boppi program. It is generic for the type system, although the project includes only one simple type system. The symbol table also manages lexical scope objects (\emph{pp.s1184725.boppi.symboltable.FunctionScope.java}) to decide variable offsets and local data sizes.\\ The symbol table has three methods for variable symbols and act similar to dictionaries: \verb|get|, \verb|put| and \verb|has|. Furthermore, there are six methods for opening and closing scopes, of which two are ``safe'' as they both open and close a scope. \verb|withFunctionScope| opens a lexical scope for a function, runs the provided function and then closes the scope. \verb|withScope| also opens a scope, runs the provided function and closes the scope, however variables will be produced by an enclosing function scope.\\ For example, in \cref{symbol-table-scopes} x, y, nested x and z are all given offsets in the same \verb|FunctionScope|. However, the nested x and z are defined in a deeper lexical scope, so they only exist within those scopes and their name may override a variable name in the same function scope (but a higher lexical scope). Moreover, since nested x and z are in unrelated scopes, they may have the same offset in the function. \begin{figure} \caption{Scope example in Boppi.} \label{symbol-table-scopes} \begin{minted}{boppi} function main() { var int x; var int y; { var int x; x := 1; }; { var int z; z := 1; }; } \end{minted} \end{figure} \section{FunctionScope} The lexical scope class \emph{pp.s1184725.boppi.symboltable.FunctionScope} is used for recording local variables within a function. An object is created with a given lexical depth, which can be retrieved at any time.\\ The \verb|FunctionScope::addVariable| method produces a variable of the provided type at the FunctionScope's lexical depth and current offset. This variable is both recorded in the object and returned. This method is used by the symbol table to produce a variable for each symbol.\\ During compilation, the generator uses the function scope to determine how large the local data size for a function has to be and to allocate and deallocate referenced objects. \section{Annotations} The class \emph{pp.s1184725.boppi.Annotations} is used for associating types, variables, functions and registers to the AST. An annotations object is first created by the checker and then passed on to and used by the generator. \section{CommandLineInterface and InteractivePrompt} The classes in \emph{pp.s1184725.boppi.util.*} provide a simple interface for the toolchain using the \emph{Apache Commons} CLI library. The CommandLineInterface is both a command line compiler for Boppi programs and a runner for ILOC code, while the InteractivePrompt is a rudimentary REPL that compiles and runs lines of code.\\ The \verb|CommandLineInterface::main| method is also the default main method when building a .jar file for the project. \section{memlib} \label{memlib} \emph{memlib} is the memory allocator module for Boppi programs. It maintains a list of free blocks on the heap in between allocated objects, akin to the C dynamic memory allocator (\emph{malloc}, \emph{free}). \emph{memlib} currently has three subroutines: \verb|memalloc|, \verb|memaddref| and \verb|memfree|. \verb|memalloc| allocates a piece of free memory for an object and sets the reference count to 1. \verb|memaddref| increments the reference count of an object by 1. \verb|memfree| decrements the reference count by 1 and deallocates the object if the count goes to zero. To improve the correctness of programs, allocated memory is always zeroed.\\ Blocks of objects and free space both have a header of eight bytes (\verb|2*INT_SIZE|) and always contain the size of the block excluding the header. Additionally, free blocks have a pointer to the next free block, whereas objects have a reference count. See \cref{memlib-objects} for the general structure of allocated and free blocks.\\ At the start of a program, \emph{memlib} builds one block of \verb|brk-3*INT_SIZE| of free space, where \verb|brk| is a special purpose register pointing to the end of the heap space. The free block starts at address \verb|3*INT_SIZE|, excluding the header, and is pointed at by address zero, as can be seen in \cref{memlib-start}. After some allocations and deallocations, the memory may end up looking like \cref{memlib-runtime}.\\ When calling \verb|memalloc|, the algorithm iterates over the free blocks in search of a block of exactly the right size or a size of at least 16 bytes (\verb|2*INT_SIZE|) more. The algorithm keeps track of the last \emph{next free block} pointer to reassign it when a suitable free block is found. If the current free block has exactly the right size, the previous free block is linked with the next block and the current block is turned into the object. If a suitable free block is larger, the object is allocated at the start of this space and a new free block is created at the end of this space and the free blocks are linked again.\\ For example, when allocating a single object of 8 bytes at the beginning of a program, the object and its header are allocated at bytes 4-20, while the free block is moved and resized to bytes 20-80. Compare \cref{memlib-start} and \cref{memlib-onealloc}.\\ The address of the first data byte of the object is returned, or \emph{null} if no space could be found or the requested object has a size of \verb|0|.\\ This routine runs in $O(n)$ for the number of free blocks, i.e. the amount of fragmentation in the memory.\\ The \verb|memaddref| routine simply increments the reference count of an object. It does not check whether the address is part of a free block or aligned to the start of an object.\\ This routine runs in $O(1)$.\\ The \verb|free| routine decrements the reference count of an object. If the counter goes to zero, the object will be deallocated. During deallocation, the algorithm iterates over the free blocks to find blocks immediately adjacent the object, if any. If a free block is present immediately in front of the object, the free block is resized to cover the object too, otherwise the object is turned into a free block. If a second free block is present immediately following this free block, the current block is resized to cover the second free block.\\ This routine runs in $O(n)$ for the number of free blocks, i.e. the amount of fragmentation in the memory.\\ \subsection{Calling convention} \label{memlib-calling} The calling convention for \emph{memlib} and other internal subroutines consists of first pushing the return address onto the stack. Any arguments must then pushed onto the stack in order, followed by a jump to the start of the subroutine. The subroutine must consume the arguments and the return address. If a subroutine has a return value, it must be pushed onto the stack.\\ For example, \emph{memalloc} requires one argument, the requested size, and returns the allocated address. Firstly, the return address is pushed, which is either loaded from a label or calculated from the current address. Secondly, the size is pushed to the stack. Thirdly, the jump to \verb|memalloc| is made and the subroutine is executed. After execution of malloc, the object's location is popped from the stack. \begin{figure} \caption{Structure of objects and free blocks with \emph{memlib}. A free block either points to the next free block or to \emph{null}.} \label{memlib-objects} \includegraphics{memlib-objects} \end{figure} \begin{figure} \caption{Structure of the memory after initialisation by \emph{memlib}.} \label{memlib-start} \includegraphics[width=\textwidth]{memlib-start} \end{figure} \begin{figure} \caption{Structure of the memory after allocating a single object of size 8 with \emph{memlib}.} \label{memlib-onealloc} \includegraphics[width=\textwidth]{memlib-onealloc} \end{figure} \begin{figure} \caption{Structure of the memory after allocating objects of sizes 8, 4 and 12 respectively and deallocating the object of size 4.} \label{memlib-runtime} \includegraphics[width=\textwidth]{memlib-runtime} \end{figure}