% Detailed language description. A systematic description of the features of your language, for each % feature specifying % – Syntax, including one or more examples; % – Usage: how should the feature be used? Are there any typing or other restrictions? % – Semantics: what does the feature do? How will it be executed? % – Code generation: what kind of target code is generated for the feature? % You may make use of your ANTLR grammar as a basis for this description, but note that not every % rule necessarily corresponds to a language feature. This section describes the language features of Boppi. The tokens used in the syntax listings and explanation can be found in \cref{report-grammar}. Comments and whitespace is discarded by the lexer, so they are not listed here. \section{Basic expressions} At the heart of the language are basic arithmetic, logic and compound expressions and literals. These support basic operations for boolean, integer and character types. \paragraph{Syntax} The basic expression language consists of a \verb|program| that contains a sequence of statements separated by semicolons, \verb|stats|. A single \verb|stat|ement in the basic language can only be an expression, which can be an arithmetic or logical expression, an integer, a character, a boolean or another sequence of statements. The ANTLR parser tree can be seen in \cref{basic-syntax}. \begin{figure} \caption{ANTLR4 code of the basic expression language of Boppi} \label{basic-syntax} \begin{minted}{antlr} program: stats EOF; stats : stat (COMPOUND stat?)* ; stat : ... | expr ; expr : ... | PAROPEN stats PARCLOSE #parens | BRAOPEN stats BRACLOSE #block | op=(PLUS|MINUS|NOT) expr #prefix1 | lhs=expr op=(MULTIPLY|DIVIDE) rhs=expr #infix1 | lhs=expr op=(PLUS|MINUS) rhs=expr #infix2 | lhs=expr op=(LT|LEQ|GTE|GT|EQ|NEQ) rhs=expr #infix3 | lhs=expr AND rhs=expr #infix4 | lhs=expr OR rhs=expr #infix5 | LITERAL10 #literalInteger | CHAR #literalCharacter | (TRUE|FALSE) #literalBoolean ; \end{minted} \end{figure} \paragraph{Examples} See \cref{basic-examples} for a few examples of basic expressions. Since input and output is only introduced in \cref{io-section}, the result of an expression is simply written in the comments. \begin{figure} \caption{Basic expressions in Boppi} \label{basic-examples} \begin{minted}{boppi} 5; //an integer literal as a statement 'c';;;; //a character literal followed by empty statements 4+(2*3/-1); //an arithmetic expression (result: -2) { true && false; //an expression with two boolean literals (result: false) 3 //the last expression in a block is the value of the block (result: 3) }+4; //another arithmetic expression (result: 7) \end{minted} \end{figure} \paragraph{Use} \verb|#prefix1|, \verb|#infix1| and \verb|#infix2| restrict their operand types and result type to integers. \verb|#infix3| restricts its operands to integers and sets the result type to boolean. However, when the operator is \verb|EQ| or \verb|NEQ|, the operands can be of any type as long as both have the same type. The resulting type will still be a boolean. \verb|#infix4| and \verb|#infix5| only allow booleans as operands and, again, returns a boolean. \paragraph{Semantics} A program consists of one compound expression. \verb|COMPOUND| separates single expressions and evaluates them left to right. The rightmost expression is the return value of the expression and other return values are discarded. Both \verb|#parens| and \verb|#block| contain a compound expression, however \verb|#block| introduces a deeper scope for the expression within, as clarified in \cref{variables-section}. \paragraph{Code generation} The literals generate a \verb|loadI| instruction. For character literals, the \verb|loadI| is followed by \verb|i2c| to make sure the loaded character is within the character range of the ILOC VM. Operators first generate their operands, locking the result register, then generate their operator instruction and free the registers. The compound expression simply generates its inner expressions in order. See \cref{basic-codegen} for an example of generated code. \begin{figure} \begin{subfigure}{0.2\textwidth} \caption{Boppi code} \begin{minted}{boppi} 5; 'c';;;; 4+(2*3/-1); { true && false; 3 }+4; \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.7\textwidth} \caption{Generated ILOC} \begin{minted}{boppi} loadI 5 => __1 // 5 loadI 99 => __1 // 'c' i2c __1 => __1 // 'c' loadI 4 => __1 // 4 loadI 2 => __2 // 2 loadI 3 => __3 // 3 mult __2,__3 => __2 // * loadI 1 => __3 // 1 rsubI __3,0 => __3 // unary - div __2,__3 => __2 // / add __1,__2 => __1 // + loadI 1 => __2 // true loadI 0 => __1 // false and __2,__1 => __2 // && loadI 3 => __1 // 3 loadI 4 => __2 // 4 add __1,__2 => __1 // + \end{minted} \end{subfigure} \caption{Generated code for basic expressions in Boppi.} \label{basic-codegen} \end{figure} \section{Variables} \label{variables-section} \paragraph{Syntax} Variables introduce two more \verb|stat|ement types: declarations and assignments. A declaration introduces a variable name and restricts it to a certain type. The type can be either one of the three built-in types (boolean, character, integer) or a previously declared identifier. The optional \verb|CONST| keyword can be used to declare a variable constant, so it may only be assigned once. Secondly, an assignment sets the value of a variable to the result of an expression. Lastly, a variable can be used in an expression if it has been assigned. See \cref{variables-syntax} for the additional ANTLR parse tree elements. \begin{figure} \caption{ANTLR4 code for variables in Boppi.} \label{variables-syntax} \begin{minted}{antlr} stat : ... | declareStat | assignStat ; declareStat : ... | DECLARE CONSTANT? type IDENTIFIER #declare ; assignStat : variable ASSIGN (assignStat | expr) #assign ; expr : ... | variable #getVariable ; type : ... | staticType=(INTTYPE | BOOLTYPE | CHARTYPE) #typeSimple | variable #typeVariable ; variable : ... | IDENTIFIER #variableSimple ; \end{minted} \end{figure} \paragraph{Examples} See \cref{variables-example} for some examples of variable usage. Again, input and output is introduced in \cref{io-section}, so the result of an expression is listed in the comments. \begin{figure} \caption{Example code with variables in Boppi.} \label{variables-example} \begin{minted}{boppi} var int myInt; //declares integer myInt (result: void) myInt := 4; //myInt is now 4 (result: 4) var myInt otherInt; //declares otherInt with the same type myInt otherInt := 4+(myInt := 2); //myInt is now 2 and otherInt is 6 (result: 6) var bool aBool; //declares boolean aBool aBool := { var const int otherInt; //declares a constant integer otherInt inside this block otherInt := 12; //otherInt is now 12 (result: 12) myInt > otherInt //(result: false) }; //aBool is now false \end{minted} \end{figure} \paragraph{Use} A variable must be declared before it can be used and its type must be given in the declaration. Moreover, a variable can only be used within an expression once it has been assigned a value. The result of an expression can only be assigned to a variable if the types match.\\ Regarding lexical scopes, a variable only exists in the scope in which it is declared and deeper scopes. A variable can be redeclared, effectively hiding the original variable, in a deeper scope, linking the name to a new variable. The type of the redeclared variable does not have to match the original type. \paragraph{Semantics} \verb|#declare| introduces a new variable to the current scope. It does not perform any action during runtime, but the identifier and its type are recorded in a symbol table during compilation. Moreover, the compiler allocates a space for the variable in the local data segment of the main AR. The identifier and its allocated space will be freed once the current scope is closed.\\ \verb|#assign| first evaluates the expression on the right hand side and then stores the result at the space allocated for that variable.\\ \verb|#getVariable| retrieves the value of a variable from the allocates space. \paragraph{Code generation} Declarations do not generate any code. An assignment generates \verb|addI r_arp,k => r| followed by \verb|store r => r| (\verb|cstore r => r| for characters) for some register \verb|r|. The local offset, \verb|k|, is decided by the symbol table. The reason for calculating an address and using a plain \verb|store| instead of using \verb|storeAI| directly, is because the address is calculated differently for different kinds of variables. Likewise, a \emph{use} of a simple variable generates \verb|addI r_arp,k => r| followed by \verb|load r => r| (\verb|cload r => r| for characters).\\ An example of (chained) assignments, uses and offsets can be seen in \cref{variables-code}. As can be seen in de generated ILOC code, the variables x, y, unused, b and c have the respective offsets of 0, 4, 8, 9 and 13. \begin{figure} \begin{subfigure}{0.2\textwidth} \caption{Boppi code} \begin{minted}{boppi} var int x; var int y; x := 4; y := 3+x; var char unused; var bool b; var bool c; c := b := x < y; \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.7\textwidth} \caption{Generated ILOC} \begin{minted}{boppi} loadI 4 => __1 // 4 addI r_arp,0 => __2 // add offset store __1 => __2 // to x loadI 3 => __1 // 3 addI r_arp,0 => __2 // add offset load __2 => __2 // load address add __1,__2 => __1 // + addI r_arp,4 => __2 // add offset store __1 => __2 // to y addI r_arp,0 => __1 // add offset load __1 => __1 // load address addI r_arp,4 => __2 // add offset load __2 => __2 // load address cmp_LT __1,__2 => __1 // < addI r_arp,9 => __2 // add offset store __1 => __2 // to b addI r_arp,13 => __2 // add offset store __1 => __2 // to c \end{minted} \end{subfigure} \caption{Generated code for basic variable use in Boppi.} \label{variables-code} \end{figure} \section{Input/Output} \label{io-section} \paragraph{Syntax} I/O introduces two expression types: input and output. In an input expression, a sequence of variables is provided and in an output expression a sequence of expressions can be provided. The ANTLR rules can be seen in \cref{io-syntax}. \begin{figure} \caption{ANTLR4 code for I/O in Boppi.} \label{io-syntax} \begin{minted}{antlr} singleExpr : ... | IN PAROPEN variable (LISTDELIM variable)* PARCLOSE #read | OUT PAROPEN expr (LISTDELIM expr)* PARCLOSE #write ; \end{minted} \end{figure} \paragraph{Examples} See \cref{io-example} for the basic use of input and output expressions. \begin{figure} \begin{subfigure}{0.5\textwidth} \caption{Boppi code} \begin{minted}{boppi} var int anInt; read(anInt); print('a'); print(read(anInt)+4); var bool aBool; aBool := true; print(aBool); \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.4\textwidth} \caption{input and output} \begin{minted}{text} > 4 <<< a > 8 <<< 12 <<< true \end{minted} \end{subfigure} \caption{Example code for I/O in Boppi.} \label{io-example} \end{figure} \paragraph{Use} Input expressions can only contain simple variables as arguments. If there is exactly one variable present, the type of the variable and its value will be passed out of the expression. Otherwise the result of the expression is \verb|void|.\\ Output expressions can only contain non-void expressions as arguments. Analogous to the input expression, when one argument provided, the type and value is passed out of the expression and \verb|void| is returned otherwise. \paragraph{Semantics} The input expression stores a value in every variable argument by reading each value from the standard input. When exactly one variable is present, the result of the expression is that value, otherwise it is void.\\ The output expression prints the result of each expression to the standard output. Analogous to the input expression, when exactly one argument is given, this will be the result of the expression.\\ If a read or print action is undefined for a type, it will halt the machine with the status \emph{ERROR\_INPUT\_UNKNOWN\_TYPE} respectively \emph{ERROR\_OUTPUT\_UNKNOWN\_TYPE}. \paragraph{Code generation} When printing expressions, the generator evaluates each expression and then prints it to the standard output. For printing an integer, the generator simply produces \verb|out "", r| where \verb|r| is the register holding the value of the current expression. For printing a character, the character is pushed onto the stack as a string and then printed using \verb|cout|, see \cref{character-output}. For printing a boolean, the generator calls a subroutine to print either \emph{true} or \emph{false} to the standard output. The subroutine can be seen in \cref{boolean-output}. For more information about the subroutine calling convention, see \cref{memlib-calling}.\\ When reading values to variables, the generator first reads from the standard input and then stores the value similarly to the assign statement. In case of booleans and integers, the generator simply produces \verb|in "" => r|. In case of a character, the generator calls a subroutine for reading a line and extracting exactly one character. The subroutine can be seen in \cref{character-input}. It reads a whole line at a time as per the \verb|cin| instruction. Empty lines are discarded and the first character of a non-empty line is returned. For the calling convention, again see \cref{memlib-calling}. \begin{figure} \caption{ILOC for printing a single character stored in register r.} \label{character-output} \begin{minted}{boppi} cpush r loadI 1 => r_t push r_t cout "" \end{minted} \end{figure} \begin{figure} \caption{\emph{stdlib} ILOC for writing a boolean.} \label{boolean-output} \begin{minted}{boppi} // write a boolean to output // stack: [return address, bool] -> [] stdbout: pop => m_1 // get boolean loadI 0 => m_2 // load zero-length string push m_2 cbr m_1 -> sbout_t,sbout_f sbout_t: cout "true" jumpI -> sbout_e sbout_f: cout "false" sbout_e: pop => m_1 // load return address jump -> m_1 \end{minted} \end{figure} \begin{figure} \caption{\emph{stdlib} ILOC for reading a single character.} \label{character-input} \begin{minted}{boppi} // read a character from input // stack: [return address] -> [char] stdcin: cin "" // get line pop => m_1 // get length cbr m_1 -> scin_t,stdcin // repeat until at least one character scin_t: cpop => m_2 // save character scin_lc: subI m_1, 1 => m_1 // decrement char count cbr m_1 -> scin_ll,scin_le scin_ll: cpop => m_0 // discard character jumpI -> scin_lc // repeat scin_le: loadI 0 => m_0 // reset zero register pop => m_1 // get return address cpush m_2 // push result character jump -> m_1 \end{minted} \end{figure} \section{Conditional code} \label{conditionals} \paragraph{Syntax} Conditionals extend the Boppi language with two expression types. The \verb|#if| expression has an optional \verb|ELSE| clause. The extra syntax can be seen in \cref{conditional-syntax}. \begin{figure} \caption{ANTLR4 code for conditionals in Boppi.} \label{conditional-syntax} \begin{minted}{antlr} expr : ... | IFOPEN cond=stats IFTRUE onTrue=stats (IFFALSE onFalse=stats)? IFCLOSE #if | WHILEOPEN cond=stats WHILETRUE onTrue=stats WHILECLOSE #while \end{minted} \end{figure} \paragraph{Examples} A few examples of \emph{if} and \emph{while} expressions can be seen in \cref{conditional-example}. The first \emph{if} construction shows how it can be used as an expression with a result, in this case either the character \verb|T| or \verb|F|. Next is an \emph{if} construction with only a consequent and no alternative. Lastly two \emph{while} constructions are presented to show the use of scoped variables and looping. \begin{figure} \begin{subfigure}{0.6\textwidth} \caption{Boppi code} \begin{minted}{boppi} var int x; print(if read(x) > 4 then 'T' else 'F' fi); if x == 8 then print('H','i') fi; while var bool cont; read(cont) do var int y; y := x; x := print(y+x); od; var int i; var int n; i := 1; n := 0; while i < x do n := n+i; i := i+1; od; print(n); \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.3\textwidth} \caption{input and output} \begin{minted}{text} > 8 <<< T <<< H <<< i > 1 <<< 16 > 1 <<< 32 > 0 <<< 496 \end{minted} \end{subfigure} \caption{Example of conditionals in Boppi.} \label{conditional-example} \end{figure} \paragraph{Use} The \emph{if} expression can freely be used with and without an alternative (\verb|ELSE|) and the result types of the consequent and the alternative can be of any type. The condition has to result in a boolean type and the whole expression will generally return \emph{void}. However, when the consequent and alternative result in the same type, the \emph{if} expression will have this return type. Then, when executed, the expression will return the result of the branch taken.\\ The \emph{while} expression has one form that also requires the condition to have a boolean type and allows the body to have any type. The expression will always return \verb|void|. \paragraph{Semantics} The \emph{if} expression first executes the condition and executes the consequent only if the condition is \verb|true|. If the condition is \emph{false} and there is an alternative, the alternative will be executed.\\ The \emph{while} expression executes the condition and executes the body if the condition is \verb|true|. Also, if the condition is \verb|true|, it will then repeat the while expression. \paragraph{Code generation} The \emph{if} statement generates two or three jump targets, depending on whether an alternative clause is present. In both cases the condition is visited followed by a \verb|cbr| instruction based on the value of the condition.\\ When no alternative is present, the \verb|cbr| jumps to either the \verb|if_t| or \verb|if_e| target. A \verb|if_t: nop| is produced immediately afterwards, after which the consequent is visited. Lastly a \verb|if_e: nop| is produced. Effectively the consequent is skipped when the condition is false.\\ When an alternative is present, the \verb|cbr| jumps to either the \verb|if_t| or \verb|if_f| target, after which a \verb|if_t: nop| is produced and the consequent is visited. If the expression is to return a value, \verb|i2i r1 => r2| a \verb|jumpI -> if_e| is produced. Next, for the alternative, a \verb|if_f: nop| is produced and the alternative is visited. \cref{if-code} is an example of an \emph{if} expression with three targets and a return value.\\ The \emph{while} loop has three jump targets, one for evaluating the condition, one for executing the body of the loop and one for breaking out of the loop. The compiler first produces a \verb|jumpI -> while_f| to jump to the condition. Then a \verb|while_t: nop| is produced as a jump target followed by the loop body. Next, a jump target for the condition, \verb|while_f: nop|, and the condition and \verb|cbr r_k -> while_t,while_e| are produced, where \verb|r_k| is the register that holds the result of the condition. Finally, the breaking jump target \verb|while_e: nop| is produced. See \cref{while-code} for an example. \begin{figure} \begin{subfigure}{0.2\textwidth} \caption{Boppi code} \begin{minted}{boppi} if 2 > 1 then 'T' else 'F' fi \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.7\textwidth} \caption{Generated ILOC} \begin{minted}{boppi} loadI 2 => __1 // 2 loadI 1 => __2 // 1 cmp_GT __1,__2 => __1 // > cbr __1 -> if_t0,if_f1 // if_t0: nop // loadI 84 => __2 // 'T' i2c __2 => __2 // 'T' i2i __2 => __1 // result jumpI -> if_e2 // if_f1: nop // loadI 70 => __2 // 'F' i2c __2 => __2 // 'F' i2i __2 => __1 // result \end{minted} \end{subfigure} \caption{Generated code for an if expression in Boppi.} \label{if-code} \end{figure} \begin{figure} \begin{subfigure}{0.2\textwidth} \caption{Boppi code} \begin{minted}{boppi} while true do 1 od \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.7\textwidth} \caption{Generated ILOC} \begin{minted}{boppi} jumpI -> while_f1 // to condition while_t0: nop // loop target loadI 1 => __1 // 1 while_f1: nop // condition target loadI 1 => __1 // true cbr __1 -> while_t0,while_e2 // while_e2: nop // end target \end{minted} \end{subfigure} \caption{Generated code for a while expression in Boppi.} \label{while-code} \end{figure} \section{Functions} \label{functions} \paragraph{Syntax} Functions introduce one new mode of declaration and one new mode of expression. The declaration of a function takes a name, an optional sequence of parameters and an optional return value. These parameters each have a type and a name. A function call requires a variable (the name of the function) followed by a sequence of expressions between parentheses. Moreover, the feature introduces the arrow and tuple at the type level, so function types can be constructed. The arrow denotes a function from the (tuple) type on the left to the type on the right, whereas the tuple is a sequence of types. The ANTLR rules can be seen in \cref{functions-syntax}. \begin{figure} \caption{ANTLR4 code for functions in Boppi.} \label{functions-syntax} \begin{minted}{antlr} declareStat : ... | FUNCTION (result=type)? name=IDENTIFIER PAROPEN parameters? PARCLOSE body=expr #declareFunction expr : ... | variable PAROPEN (expr (LISTDELIM expr)*)? PARCLOSE #call parameters : ... | type IDENTIFIER (LISTDELIM type IDENTIFIER)* type : ... | type ARROW type #typeFunction | PAROPEN (type (LISTDELIM type)*)? PARCLOSE #typeTuple \end{minted} \end{figure} \paragraph{Examples} Functions can be used in many ways in Boppi. They can have any number of arguments and a return value is optional. Since Boppi allows for side effects, functions without a return value have their use. One example is shown in \cref{functions-example}: \verb|logPrint| counts the number of times the function has been called by incrementing the non-local variable \verb|numCalls|. A function with a return value can used in an expression, as can be seen in the same example with the function \verb|add|. \begin{figure} \begin{subfigure}{0.6\textwidth} \caption{Boppi code} \begin{minted}{boppi} var int numCalls; numCalls := 0; function logPrint(int n) { numCalls := numCalls+1; print(n); }; logPrint(1); logPrint(2); print(numCalls); function int add(int a, int b) a+b; print(10*add(3, 4)); \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.3\textwidth} \caption{input and output} \begin{minted}{text} <<< 1 <<< 2 <<< 2 <<< 70 \end{minted} \end{subfigure} \caption{Example of functions in Boppi.} \label{functions-example} \end{figure} \paragraph{Use} The function declaration has an optional return type, a name, a sequence of parameters in parentheses and an expression that is the body of the function. The sequence of parameters may be empty and each parameter consists of a type and a name. After the declaration, the function variable is marked \emph{constant} and \emph{assigned}. Since the function declaration enters a new scope, the parameter names may override variable names outside the function. Inside the body of the function, the parameters will have a value \emph{assigned}. If the function has a return type, the function body must return this type. Because the function declaration as a whole is a declare statement, it has no result type and must not be the last statement of a block.\\ The function call consists of a name followed by a sequence of expressions in parentheses. The function name must match a variable that has a function type. The number of expressions must match the number of formal parameters of the function. Each expression must be of the same type as the corresponding formal parameter. If the function has a return type, the function call results in this type, otherwise the result type is \verb|void|.\\ A function type must always have an arrow at the top level and a tuple on its left side. A function variable is not initialized at declaration and there is no run-time check during a function call, so the user should heed any warning that a variable may not be assigned. Lastly, there is currently no way to construct a function type with no return value, so variables and parameters can only have these types by copying the type of another function.\\ \paragraph{Semantics} A function declaration creates a variable in the current scope with the name of the function. The type is constructed as a tuple of the parameter types and the return type. The return type will \verb|void| if not present. The function variable is marked assigned. The body of the function is not evaluated.\\ A function call first retrieves the function reference and constructs an activation record (\emph{AR}). Then each of the parameters is evaluated and their results are stored within the AR. Finally, the program jumps to the body of the function.\\ The body of a function declaration links all the formal parameters to places within the AR, which are assigned a value by the function call. During a call, first the body of the function is evaluated. Next, if there is a result, it will be stored in the AR. Then, the AR will be dereferenced and, if it is marked for deletion, all local reference variables will be dereferenced. Finally, the control flow is moved back to where the function call was made.\\ At the end of a function call, the result is loaded, if there is any. \paragraph{Code generation} In a function declaration, the generator performs the following steps: \begin{enumerate} \item Jump over the function body. \item Visit the inner expression. \item If the function is to return a value, generate a \verb|storeAI r_res => r_arp, OFFSET_RETURN|, with \verb|r_res| the register that holds the result of the inner expression. \item Check whether there is only one reference left to the current AR. If so, free (decrement the reference count of) all local variables where applicable. \item Generate a \verb|loadAI r_arp, OFFSET_RETURN_ADDRESS => r_temp| and \verb|jump -> r_temp| to return to the call site. \item Allocate a tuple for the target address, current AR and desired AR size. \item Store the relevant values in that tuple, with the target address and AR size decided at compile-time, while the current AR is decided at run-time. \item Store the address of the tuple in the function variable's offset. \item Increment the reference count of the current AR and its parents. \end{enumerate} The tuple is required for two reasons. Firstly, because the current AR is relevant in case the function uses non-local variables and is used outside the current function. Secondly, because the AR size is relevant due to \cref{problems-reassigned-closures}. The tuple is referenced rather than stored in place because it is easier to move around a single integer.\\ The references to the AR's parents are incremented because of \cref{problem-closures}.\\ Freeing local reference variables at the end of the function is done as a simple mechanism to clean up memory. A better approach would be to generate cleanup procedures, as discussed in \cref{future-work}.\\ In a function call, the generator performs a number of steps: \begin{enumerate} \item Retrieve the function variable. \item Load the function tuple using \verb|load r_var => r_tuple| with \verb|r_var| the register holding the address of the function variable. \item Allocate the AR using the size specified in the tuple, \verb|loadAI r_tuple, OFFSET_AR_SIZE => r_narp|. \item Shift the pointer to the callee AR by \verb|addI r_narp, AR_BASE_SIZE => r_narp| so AR properties have negative offsets and local variables (including parameters) start at an offset of 0. \item Evaluate each parameter of the function and copy the result to the callee AR using \verb|storeAI r_res => r_narp, c_offset|, with \verb|c_offset| the offset of the formal parameter. \item Save all registers that are in use by pushing them to the stack. \item Save the current AR in the callee AR, \verb|storeAI r_arp => r_narp, OFFSET_CALLER_ARP|, to retrieve it after the call. \item Copy the parent AR that belongs to the function tuple to the callee AR. \item Increment the reference count to the parent ARs of the callee. \item Save the return address to the callee AR. \item Switch to the callee AR. \item Jump to the call site pointed to by the function tuple. \item Decrement the reference count to the callee AR and its parents. \item Restore all the registers that were in use by popping them from the stack. \item Load the result of the function, if any. \item Restore the AR. \end{enumerate} Note there is no particular reason why the registers are saved to the stack, while the caller's ARP and return address are saved to the callee's AR.\\ For a simple example of a function declaration and call, see \cref{functions-code}. Because the generated ILOC spans more than 100 lines, it can be found as a separate file. \begin{figure} \begin{subfigure}{0.5\textwidth} \caption{Boppi code} \begin{minted}{boppi} function int successor(int n) n+1; var int x; read(x); print(successor(x)); \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.5\textwidth} \caption{Generated ILOC} See \emph{doc/successor.iloc.txt}. Instructions 178-193 form the body of \verb|successor| with 180-183 generated by the expression in the body, 194-217 form the allocation of \verb|successor| as a variable, 222-270 form a call to \verb|successor| and 272-293 form the deallocation of \verb|successor|. \end{subfigure} \caption{Generated code for functions in Boppi.} \label{functions-code} \end{figure} \section{Arrays} \label{arrays} \paragraph{Syntax} Arrays add new syntax in three places in the language. It introduces a way to construct an array type of any type and two ways two construct an array. The first way to construct an array is providing an array literal: \verb|[ element1, element2, ... ]|. The second way is to provide the element type and the number of elements: \verb|array( type, length )| where length can be any integer expression. The choice was made to require the element type, because it allows the type checking to only use a synthesized attribute. For the same reason, an array literal must contain at least one item. Lastly, arrays introduce two variable constructions: the array element accessor and the property accessor. The ANTLR rules can be seen in \cref{arrays-syntax}.\\ The way arrays are declared and defined is contrary to the assignment. While arrays were defined as fixed-size vectors in a previous iteration of the language, this was considered too restrictive in practice. \begin{figure} \caption{ANTLR4 code for arrays in Boppi.} \label{arrays-syntax} \begin{minted}{antlr} expr : ... | ARRAY PAROPEN type LISTDELIM size=expr PARCLOSE #defineArray | ARROPEN expr (LISTDELIM expr)* ARRCLOSE #literalArray type : ... | type ARROPEN ARRCLOSE #typeArray variable : ... | variable ARROPEN expr ARRCLOSE #variableArray | variable PROP IDENTIFIER #variableProperty \end{minted} \end{figure} \paragraph{Examples} An example of using arrays can be seen in \cref{arrays-example}. \begin{figure} \begin{subfigure}{0.6\textwidth} \caption{Boppi code} \begin{minted}{boppi} var int[] fibs; fibs := [1,1,2,3,5,8]; var int i; read(i); while i < fibs.length do print(fibs[i]); i := i+1; od \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.3\textwidth} \caption{input and output} \begin{minted}{text} > 2 <<< 2 <<< 3 <<< 5 <<< 8 \end{minted} \end{subfigure} \caption{Example of arrays in Boppi.} \label{arrays-example} \end{figure} \paragraph{Use} Array variables are not assigned at declaration. As such, the user should heed a warning that a variable may not be assigned, since their value may point anywhere. Moreover, an array may contain undefined elements, which neither the compiler nor the run-time will detect.\\ Array variables have exactly one named property, their \verb|length|. This is always non-negative for assigned arrays and undefined otherwise. All other types up to here have no properties.\\ An array literal may contain any positive number of elements, which must all have the same type. The type of the resulting array is, naturally, an array of the elements' type and the length is equal to the number of expressions in the literal.\\ An array constructor comprises a type, which will be the type of the elements, and a non-negative number of items. Note that the elements will be undefined.\\ An array accessor may only be used on an array type. The result type will be the element type of the array.\\ Arrays can be compared with each other for equality if they have the same element type.\\ Lastly, an array of characters can be printed to standard output and can be read from standard input.\\ Note that, while an array variable may be defined constant, its elements can still be changed. \paragraph{Semantics} An array is a finite sequence of items of a single type whose values can be retrieved through a zero-based index. An array literal creates an array exactly large enough to hold all the expressions inside, then evaluates the expressions left-to-right and puts the results in the corresponding array index. An array constructor simply evaluates the requested length and allocates an array of that length, or halts the machine if the length is negative.\\ Assigning an array to a variable means that variable will point to the array from that point. This means an expression like \verb|array1 := array2;| will result in both variables pointing to the same array, so changing an element of \verb|array1| will change the element for \verb|array2|.\\ An array accessor evaluates the index expression and then returns the element at that index, or halts the machine if that index is out of bounds.\\ An equality check between two arrays compares the length and each element of the array. Note that, for nested arrays, this will compare the addresses of the inner arrays, rather than the length and values within those arrays. \paragraph{Code generation} The array constructor first evaluates the expression, then generates a check whether the array size is valid and either \verb|halt|s the machine or allocates the array.\\ An array literal generates the allocation of the array, then, for each expression, evaluates it and puts the result in the array using \verb|storeAI r_res => r_array, c_offset|, with \verb|r_res| the result of the expression, \verb|r_array| the base address of the array and \verb|c_offset| the offset of the particular element calculated at compile-time.\\ An array access generates the following steps (illustrated in \cref{arrays-access-snippet}): \begin{enumerate} \item Visit the array variable. \item Load the array's address. \item Visit the index expression. \item Load the array's memory size and divide it by the element size. \item Check whether the calculated index is less than the array's size and not negative. Halt if this is not the case. \item Multiply the index by the element size to get the offset and add it to the array's base address to get the address of the element. \end{enumerate} Retrieving the length of an array requires a few steps because the length is only stored implicitly. The generator first retrieves the array variable. Then it produces a \verb|load r_temp => r_temp| instruction to get the array's base address, followed by a \verb|addI r_temp, OFFSET_OBJECT_SIZE => r_temp| and \verb|load r_temp => r_temp| to retrieve the memory size. Lastly, it produces \verb|divI r_temp, c_element_size => r_temp| to convert the size to the number of elements. \begin{figure} \caption{Array access snippet from \cref{arrays-code}.} \label{arrays-access-snippet} \begin{minted}{boppi} addI r_arp,0 => __2 // add offset load __2 => __2 // get array object loadI 0 => __3 // 0 loadAI __2,-4 => __1 // check array index divI __1,4 => __1 // check array index cmp_LT __3,__1 => __1 // check array index cmp_GE __3,r_nul => __4 // check array index and __1,__4 => __4 // check array index cbr __4 -> nob5,oob4 // check array index oob4: haltI 1634692962 // array index out of bounds nob5: multI __3,4 => __3 // multiply index by size add __2,__3 => __2 // get array index address \end{minted} \end{figure} See \cref{arrays-code} for a simple example of a nested array. Because the generated ILOC spans around 100 lines, it can be found as a separate file. \begin{figure} \begin{subfigure}{0.2\textwidth} \caption{Boppi code} \begin{minted}{boppi} var int[][] matrix; matrix := [ [1,2], [3,4] ]; print(matrix[0][1]); \end{minted} \end{subfigure} \hfill \begin{subfigure}{0.7\textwidth} \caption{Generated ILOC} See \emph{doc/nestedArray.iloc.txt}. Lines 178-183 allocate the top-level array, whereas 184-189 and 196-200 allocate the second-level arrays. Lines 206-215 try to decrement the reference to the current array of \verb|matrix| if assigned, which it isn't. Lines 217-229 increment and decrement the reference count of the outer array because of the assignment expression and the statement ending. Lines 230-253 retrieve the matrix's element (0,1), which is the two. \end{subfigure} \caption{Generated code for arrays in Boppi.} \label{arrays-code} \end{figure} \section{Reference types} This is an addendum to function variables and arrays. Both of these are reference types, meaning that the values of the variables are merely pointers to a heap-allocated segment of data. In order to automatically garbage collect the objects, the compiler needs to change the reference count (\emph{RC}) whenever a reference type is used.\\ When a reference type is used in an expression, its RC is incremented. When the result of an expression is discarded, e.g. at the end of a statement or in an output expression with multiple arguments, a reference type will have its RC decremented. In an assignment with a reference type, first the expression is evaluated, then the RC of the old value of the variable is decremented and finally the RC of the new value is incremented.