diff --git a/README.md b/README.md index cacf5c4..8ea1ba2 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,6 @@ # Prerequisites Boppi requires JDK 1.8 or higher and [Ant](https://ant.apache.org/). Both must be available in the environment. +Ant may be provided by your Java IDE, may be installed via a package manager or may be downloaded from the website. See (https://ant.apache.org/manual/index.html) for manual installation instructions. # Installation @@ -10,18 +11,26 @@ Run `ant build-all` to do the above and generate _javadoc_ documentation, run _J To see all possible targets, run `ant -verbose -projecthelp`. +# Troubleshooting +If the generated ANTLR files end up in the wrong directory, please uncomment occurences of +> +> +in the Ant file. + + # Command line use After building a _JAR_ file, a command `boppi` becomes available in the `dist/` folder. This command can be used to compile and run files or perform an interactive session. See `boppi --help` and `boppi interactive --help` for more information. # Directory structure - `bin` contains compiled java code and required text files (after `ant do-build`) -- `dist` contains a runnable JAR, script files and a copy of the libraries required to run the JAR (after `ant do-build-jar`) -- `doc` contains a report of the project and attached example files - - `doc/javadoc` contains _javadoc_ documentation of the project (after `ant do-javadoc`) - - `doc/junit` contains a report of _JUnit_ tests (after `ant do-junit` and `ant do-junit-report`) +- `dist` contains a runnable JAR, script files and a copy of the libraries required to run the JAR (generated by `ant do-build-jar`) +- `doc` contains both the assignment description and a report of the project and attached example files + - `doc/javadoc` contains _javadoc_ documentation of the project (generated by `ant do-javadoc`) + - `doc/junit` contains a report of _JUnit_ tests (generated by `ant do-junit` and `ant do-junit-report`) - `lib` contains Java libraries required for the project, excluding those in the JDK 1.8 - `src` - `src/pp/iloc` contains java code for a slightly modified ILOC virtual machine - `src/pp/s1184725/boppi` contains java code for the _Boppi_ language - `util` contains [Pygments](http://pygments.org/) lexers for both ILOC and Boppi and scripts to run the Boppi command line interface (used for the JAR build). + diff --git a/doc/pp-student-final-CC.pdf b/doc/pp-student-final-CC.pdf new file mode 100644 index 0000000..bcefa5d Binary files /dev/null and b/doc/pp-student-final-CC.pdf differ diff --git a/doc/report-description.tex b/doc/report-description.tex index 5d69182..e420c4b 100644 --- a/doc/report-description.tex +++ b/doc/report-description.tex @@ -6,7 +6,7 @@ % – Code generation: what kind of target code is generated for the feature? % You may make use of your ANTLR grammar as a basis for this description, but note that not every % rule necessarily corresponds to a language feature. -This section describes the language features of Boppi. The tokens used in the syntax listings and explanation can be found in \cref{report-grammar}. Comments and whitespace is discarded by the lexer, so they are not listed here. +This section describes the language features of Boppi. The tokens used in the syntax listings and explanation can be found in \cref{report-grammar}. Comments and whitespace are discarded by the lexer, so they are not listed here. \section{Basic expressions} At the heart of the language are basic arithmetic, logic and compound expressions and literals. These support basic operations for boolean, integer and character types. @@ -51,13 +51,13 @@ See \cref{basic-examples} for a few examples of basic expressions. Since input a \caption{Basic expressions in Boppi} \label{basic-examples} \begin{minted}{boppi} -5; //an integer literal as a statement -'c';;;; //a character literal followed by empty statements -4+(2*3/-1); //an arithmetic expression (result: -2) +5; //an integer literal as a statement +'c';;;; //a character literal followed by empty statements +4+(2*3/-1); //an arithmetic expression (result: -2) { true && false; //an expression with two boolean literals (result: false) - 3 //the last expression in a block is the value of the block (result: 3) -}+4; //another arithmetic expression (result: 7) + 3 //the last expression in a block is the value of the block (result: 3) +}+4; //another arithmetic expression (result: 7) \end{minted} \end{figure} @@ -514,7 +514,7 @@ while_e2: nop // end target \section{Functions} \label{functions} \paragraph{Syntax} -Functions introduce one new mode of declaration and one new mode of expression. The declaration of a function takes a name, an optional sequence of parameters and an optional return value. These parameters each have a type and a name. A function call requires a variable (the name of the function) followed by a sequence of expressions between parentheses. Moreover, the feature introduces the arrow and tuple at the type level, so function types can be constructed. The arrow denotes a function from the (tuple) type on the left to the type on the right, whereas the tuple is a sequence of types. The ANTLR rules can be seen in \cref{functions-syntax}. +Functions introduce one new mode of declaration and one new mode of expression. The declaration of a function takes a name, an optional sequence of parameters and an optional return value. These parameters each have a type and a name. A function call requires a variable (the name of the function) followed by a sequence of expressions between parentheses. Moreover, the feature introduces the arrow and tuple at the type level, so function types can be constructed. The arrow denotes a function \emph{from} the (tuple) type on the left \emph{to} the type on the right, whereas the tuple is a sequence of types. The ANTLR rules can be seen in \cref{functions-syntax}. \begin{figure} \caption{ANTLR4 code for functions in Boppi.} @@ -585,7 +585,7 @@ A function type must always have an arrow at the top level and a tuple on its le \paragraph{Semantics} A function declaration creates a variable in the current scope with the name of the function. The type is constructed as a tuple of the parameter types and the return type. The return type will \verb|void| if not present. The function variable is marked assigned. The body of the function is not evaluated.\\ -A function call first retrieves the function reference and constructs an activation record (\emph{AR}). Then each of the parameters is evaluated and their results are stored within the AR. Finally, the program jumps to the body of the function.\\ +Functions in Boppi are call-by-value. A function call first retrieves the function reference and constructs an activation record (\emph{AR}). Then the parameters are evaluated left-to-right and their results are stored in the AR. Finally, the program jumps to the body of the function.\\ The body of a function declaration links all the formal parameters to places within the AR, which are assigned a value by the function call. During a call, first the body of the function is evaluated. Next, if there is a result, it will be stored in the AR. Then, the AR will be dereferenced and, if it is marked for deletion, all local reference variables will be dereferenced. Finally, the control flow is moved back to where the function call was made.\\ At the end of a function call, the result is loaded, if there is any. diff --git a/doc/report-problems.tex b/doc/report-problems.tex index a45ebd1..9c82fcb 100644 --- a/doc/report-problems.tex +++ b/doc/report-problems.tex @@ -5,15 +5,15 @@ Expressions of the form $((a \star b) \star (c \star d)) \star ((e \star f) \sta \item increase the number of registers to fit all sub-expression results \item treat expressions as functions so the implementation of function calls will store the sub-expressions \end{enumerate} -A combination of solutions is also possible. For example, store results in registers until all registers are in use, then start pushing results to the stack, or always push results to the stack and remove redundant push-pop instructions during an optimization pass. +A combination of solutions is also possible. For example, store sub-expression results in registers until all registers are in use, then start pushing results to the stack. Or alternatively, always push sub-expression results to the stack and remove redundant push-pop instructions during an optimization pass. \paragraph{Solution} -The second solution is chosen for its simplicity, performance and the fact that the ILOC VM supports infinite registers. +The second solution is chosen for its simplicity (it cannot interfere with stack allocations), performance (no extra instructions for pushing and popping) and the fact that the ILOC VM supports an infinite number of registers. \section{Function calls within parameters} -Each function call requires an Activation Record (\emph{AR}). A straightforward way of allocating new ARs is allocation at a static offset relative to the current AR. This may lead to problems with function calls of the form \verb|f(g(h))| or \verb|f(a, g(h))|: the AR of \verb|f| must not be overwritten during the call to \verb|g|.\\ +Each function call requires an Activation Record (\emph{AR} or stack frame). A straightforward way of allocating new ARs is allocation at a static offset relative to the current AR. This may lead to problems with function calls of the form \verb|f(g(h))| or \verb|f(a, g(h))|: the AR of \verb|f| must not be overwritten during the call to \verb|g|.\\ There are multiple solutions to this problem: \begin{enumerate} \item dynamically allocating the AR: this requires a dynamic allocator @@ -63,9 +63,10 @@ Keeping track of active ARs is solved by incrementing and decrementing the whole Neither the type, scope nor entry point of a function specify the local data size of a function. This poses a problem when a function call is made to an assigned/reassigned function. This can be solved in two ways: \begin{enumerate} - \item resize the AR during the prologue of a call + \item resize the AR during the prologue of a call or allocate another block of memory for local data \item store the required size of the AR in the function reference \end{enumerate} \paragraph{Solution} -The second solution was chosen because it was easier to implement. +The second solution was chosen because it was easiest to implement and most perfomant. Resizing the AR would require a reallocate function in ILOC, while putting local data in another memory block would require an extra level of indirection whenever a local variable is used. + diff --git a/doc/report-software.tex b/doc/report-software.tex index 35ad99f..19c9eac 100644 --- a/doc/report-software.tex +++ b/doc/report-software.tex @@ -3,7 +3,7 @@ % on the concepts and terminology you learned during the course, such as synthesised and inherited % attributes, tree listeners and visitors. -The compiler chain is written in Java mostly, with a preamble (\emph{memlib}) written in ILOC. The following sections describe the Java classes and the ILOC preamble. +The compiler chain is written in Java mostly, with two preambles (\emph{memlib} and \emph{stdlib}) for compiled programs written in ILOC. The following sections describe the Java classes and the ILOC preamble. @@ -29,7 +29,7 @@ The only inherited attributes during checking are the booleans \verb|inLhs| and The synthesised attributes during checking are the type of a node (\verb|Annotations::types|) and, when applicable, the variable belonging to an identifier (\verb|Annotations::variables|) and the local variables of a function (\verb|Annotations::function|). The latter are only used in the generating phase.\\ -The checker tries to check a whole program best-effort. When a problem is encountered, e.g. an illegal redefinition of a variable, the problem is reported and the expression is ignored when possible. When ignoring is not an option, e.g. using an undefined variable in an expression, the type is set to \verb|void| and a chain of errors may be reported.\\ +The checker tries to check a whole program best-effort. When a problem is encountered, e.g. an illegal redefinition of a variable, the problem is reported to the logger and the expression is ignored if possible. When ignoring is not an option, e.g. using an undefined variable in an expression, the type is set to \verb|void|, which may lead to a chain of errors further on.\\ All errors and warnings are reported to a \verb|Logger| that is provided to the checker. @@ -71,20 +71,21 @@ add r, g => r \end{figure} -The generator has a number of helper methods to generate calls to \emph{memlib} functions. These methods take a number of registers and produce a sequence of ILOC instructions. Moreover, it has a few helper methods to increment and decrement AR references because of the solution to closures (\cref{problem-closures}).\\ +The generator has a number of helper methods to generate calls to \emph{memlib} functions. These methods take a couple of registers and produce a sequence of ILOC instructions. Moreover, it has a few helper methods to increment and decrement AR references, because of the chosen approach to closures, see (\cref{problem-closures}).\\ The generator has a few scenarios for which it produces \verb|haltI| instructions with the appropriate \verb|ERROR_x| value. They may either be generated due to forcing an incorrect program to compile or due to a runtime error. Currently negative array sizes and array out-of-bounds errors are the only runtime errors that lead to a \verb|haltI| instruction.\\ +Like the type checker, the generator logs errors to the provided \verb|Logger| rather than throwing exceptions.\\ -Lastly, the generator prepends a program with \emph{memlib} and \emph{stdlib} to have access to basic functions. +Lastly, the generator prepends a program with \emph{memlib} and \emph{stdlib} to have access to basic functions on which the generated code relies. \section{Symbol table} -The symbol table \emph{pp.s1184725.boppi.CachingSymbolTable} keeps track of existing symbols (variables) while the checker traverses a Boppi program. It is generic for the type system. The symbol table also manages lexical scope objects (\emph{pp.s1184725.boppi.FunctionScope.java}) to decide variable offsets and local data sizes.\\ +The symbol table \emph{pp.s1184725.boppi.symboltable.CachingSymbolTable} keeps track of existing symbols (variables) while the checker traverses a Boppi program. It is generic for the type system, although the project includes only one simple type system. The symbol table also manages lexical scope objects (\emph{pp.s1184725.boppi.symboltable.FunctionScope.java}) to decide variable offsets and local data sizes.\\ -The symbol table has three methods for variable symbols and act analogous to dictionaries: \verb|get|, \verb|put| and \verb|has|. Furthermore, there are six methods for opening and closing scopes, of which two are ``safe'' as they both open and close a scope. \verb|withFunctionScope| opens a lexical scope for a function, runs the provided function and then closes the scope. \verb|withScope| also opens a scope, runs the provided function and closes the scope, however variables will be produced by an enclosing function scope.\\ +The symbol table has three methods for variable symbols and act similar to dictionaries: \verb|get|, \verb|put| and \verb|has|. Furthermore, there are six methods for opening and closing scopes, of which two are ``safe'' as they both open and close a scope. \verb|withFunctionScope| opens a lexical scope for a function, runs the provided function and then closes the scope. \verb|withScope| also opens a scope, runs the provided function and closes the scope, however variables will be produced by an enclosing function scope.\\ For example, in \cref{symbol-table-scopes} x, y, nested x and z are all given offsets in the same \verb|FunctionScope|. However, the nested x and z are defined in a deeper lexical scope, so they only exist within those scopes and their name may override a variable name in the same function scope (but a higher lexical scope). Moreover, since nested x and z are in unrelated scopes, they may have the same offset in the function. \begin{figure} @@ -111,17 +112,22 @@ function main() { \section{FunctionScope} -The lexical scope class \emph{pp.s1184725.boppi.FunctionScope} contains local variables within a function. An object is created with a given lexical depth, which can be retrieved at any time.\\ +The lexical scope class \emph{pp.s1184725.boppi.symboltable.FunctionScope} is used for recording local variables within a function. An object is created with a given lexical depth, which can be retrieved at any time.\\ The \verb|FunctionScope::addVariable| method produces a variable of the provided type at the FunctionScope's lexical depth and current offset. This variable is both recorded in the object and returned. This method is used by the symbol table to produce a variable for each symbol.\\ -The generator uses the function scope to determine how large the local data size for a function has to be and to allocate and deallocate objects where applicable. +During compilation, the generator uses the function scope to determine how large the local data size for a function has to be and to allocate and deallocate referenced objects. -\section{FunctionScope} -The lexical scope class \emph{pp.s1184725.boppi.FunctionScope} contains local variables within a function. An object is created with a given lexical depth, which can be retrieved at any time.\\ -The \verb|FunctionScope::addVariable| method produces a variable of the provided type at the FunctionScope's lexical depth and current offset. This variable is both recorded in the object and returned. This method is used by the symbol table to produce a variable for each symbol.\\ -The generator uses the function scope to determine how large the local data size for a function has to be and to allocate and deallocate objects where applicable. +\section{Annotations} +The class \emph{pp.s1184725.boppi.Annotations} is used for associating types, variables, functions and registers to the AST. An annotations object is first created by the checker and then passed on to and used by the generator. + + + + +\section{CommandLineInterface and InteractivePrompt} +The classes in \emph{pp.s1184725.boppi.util.*} provide a simple interface for the toolchain using the \emph{Apache Commons} CLI library. The CommandLineInterface is both a command line compiler for Boppi programs and a runner for ILOC code, while the InteractivePrompt is a rudimentary REPL that compiles and runs lines of code.\\ +The \verb|CommandLineInterface::main| method is also the default main method when building a .jar file for the project. diff --git a/doc/report-summary.tex b/doc/report-summary.tex index 98241c0..ba489c4 100644 --- a/doc/report-summary.tex +++ b/doc/report-summary.tex @@ -1,4 +1,4 @@ -This report describes the programming language and implementation \emph{Boppi}. First a summary of the language is provided, listing the kind of features in the language. Next is a description of language features including examples and semantics. This is followed by a description of Java classes used to compile the language and a listing of changes to the ILOC virtual machine. Then the obstacles encountered during development and solutions to them are covered. Lastly, a description of test programs and concluding words about the project. +This report describes the programming language and implementation \emph{Boppi}. First a summary of the language is provided, listing the kind of features in the language. Next is a description of language features including examples and semantics. This is followed by a description of Java classes used to compile the language and a listing of changes to the ILOC virtual machine. Then the obstacles encountered during development and solutions to them are covered. Lastly, a description of test programs and concluding words about the project are provided. \section{Language overview} diff --git a/doc/report-test-program.tex b/doc/report-test-program.tex index af56d2b..ff0e7d0 100644 --- a/doc/report-test-program.tex +++ b/doc/report-test-program.tex @@ -2,7 +2,7 @@ % target code for that program and one or more example executions showing the correct functioning of % the generated code. -As an example of a test program, we will look at a memoizing recursive fibonacci program. This program is well-suited because it contains I/O, variables, loops, functions, arrays, function references and closures. See \cref{test-example} for the source code and the text file \verb|doc/fibonacciRecursiveExample.iloc| for the generated ILOC. The compiled form is quite long, mostly because it heavily increments and decrements reference counts and does so in-line and without optimizations.\\ +As an example of a test program, we will look at a memoizing recursive fibonacci program. This program is well-suited because it contains I/O, variables, loops, functions, arrays, function references and closures. See \cref{test-example} for the source code and the text file \verb|doc/fibonacciRecursiveExample.iloc.txt| for the generated ILOC. The compiled form is quite long, mostly because it heavily increments and decrements reference counts and does so in-line and without optimizations.\\ The program works by repeatedly asking for a number. If the user provides a positive number, it returns the value of the Fibonacci sequence at that position. If the user provides a zero or negative number, the program terminates. See \cref{test-example-runs} for a few runs of the program. \begin{figure} diff --git a/doc/report-tests.tex b/doc/report-tests.tex index e2ab52a..d919764 100644 --- a/doc/report-tests.tex +++ b/doc/report-tests.tex @@ -7,6 +7,8 @@ Testing the Boppi language is done with fully automated ANTLR4 tests. The test suite is designed to quickly add new syntactic, semantic and runtime tests and check them for errors. For this purpose, the \emph{pp.s1184725.boppi.test.BoppiTests} class contains various helper methods that capture errors and pass input and output.\\ Each feature of the Boppi language is tested for correctness with a set of automated tests. Each set of tests checks for both correct and incorrect syntax, semantics and runtime evaluation. Moreover, the dynamic allocator \cref{memlib} is tested for correctness in a separate test suite.\\ +The test report can be found in \emph{doc/junit/index.html}. Error messages and ILOC code are hidden in the \emph{System.out} link of each test case, since it is impossible to have JUnit provide a message on success.\\ + \section{Basic expressions} The basic expressions are tested for: \begin{itemize}