boppi/doc/report-description.tex

802 lines
42 KiB
TeX
Raw Normal View History

2017-07-21 09:38:24 +00:00
% Detailed language description. A systematic description of the features of your language, for each
% feature specifying
% Syntax, including one or more examples;
% Usage: how should the feature be used? Are there any typing or other restrictions?
% Semantics: what does the feature do? How will it be executed?
% Code generation: what kind of target code is generated for the feature?
% You may make use of your ANTLR grammar as a basis for this description, but note that not every
% rule necessarily corresponds to a language feature.
2018-02-18 18:15:37 +00:00
This section describes the language features of Boppi. The tokens used in the syntax listings and explanation can be found in \cref{report-grammar}. Comments and whitespace is discarded by the lexer, so they are not listed here.
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
\section{Basic expressions}
At the heart of the language are basic arithmetic, logic and compound expressions and literals. These support basic operations for boolean, integer and character types.
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
\paragraph{Syntax}
2018-03-01 22:53:49 +00:00
The basic expression language consists of a \verb|program| that contains a sequence of statements, \verb|stats|, separated by semicolons. A single \verb|stat|ement in the basic language can only be an expression, which can be an arithmetic or logical expression, an integer, a character, a boolean or another sequence of statements. The ANTLR parser tree can be seen in \cref{basic-syntax}.
2018-02-18 18:15:37 +00:00
\begin{figure}
\caption{ANTLR4 code of the basic expression language of Boppi}
\label{basic-syntax}
\begin{minted}{antlr}
program: stats EOF;
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
stats
: stat (COMPOUND stat?)*
;
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
stat
: ...
| expr
2017-07-21 09:38:24 +00:00
;
2018-02-18 18:15:37 +00:00
expr
: ...
| PAROPEN stats PARCLOSE #parens
| BRAOPEN stats BRACLOSE #block
| op=(PLUS|MINUS|NOT) expr #prefix1
| lhs=expr op=(MULTIPLY|DIVIDE) rhs=expr #infix1
| lhs=expr op=(PLUS|MINUS) rhs=expr #infix2
| lhs=expr op=(LT|LEQ|GTE|GT|EQ|NEQ) rhs=expr #infix3
| lhs=expr AND rhs=expr #infix4
| lhs=expr OR rhs=expr #infix5
2017-07-21 09:38:24 +00:00
| LITERAL10 #literalInteger
| CHAR #literalCharacter
| (TRUE|FALSE) #literalBoolean
;
2018-02-18 18:15:37 +00:00
\end{minted}
\end{figure}
2017-07-21 09:38:24 +00:00
\paragraph{Examples}
2018-02-18 18:15:37 +00:00
See \cref{basic-examples} for a few examples of basic expressions. Since input and output is only introduced in \cref{io-section}, the result of an expression is simply written in the comments.
\begin{figure}
\caption{Basic expressions in Boppi}
\label{basic-examples}
\begin{minted}{boppi}
5; //an integer literal as a statement
'c';;;; //a character literal followed by empty statements
4+(2*3/-1); //an arithmetic expression (result: -2)
{
true && false; //an expression with two boolean literals (result: false)
3 //the last expression in a block is the value of the block (result: 3)
}+4; //another arithmetic expression (result: 7)
\end{minted}
\end{figure}
\paragraph{Use}
2018-03-01 22:53:49 +00:00
\verb|#prefix1|, \verb|#infix1| and \verb|#infix2| (negation, multiplication and division, addition and subtraction) restrict their operand types and result type to integers, except for the \verb|#prefix| \emph{NOT}, which only allows a boolean as its operand and results in a boolean. \verb|#infix3| (comparison) restricts its operands to integers and sets the result type to boolean. However, when the operator is \verb|EQ| or \verb|NEQ|, the operands can be of any type as long as both have the same type. The resulting type will still be a boolean. \verb|#infix4| (logical and) and \verb|#infix5| (logical or) only allow booleans as operands and, again, return a boolean.
2018-02-18 18:15:37 +00:00
\paragraph{Semantics}
A program consists of one compound expression.
2018-03-01 22:53:49 +00:00
\verb|COMPOUND| separates single expressions and evaluates them left to right. The rightmost expression is the return value of the expression. For any other expressions that result in a value, this value is discarded.
2018-02-18 18:15:37 +00:00
Both \verb|#parens| and \verb|#block| contain a compound expression, however \verb|#block| introduces a deeper scope for the expression within, as clarified in \cref{variables-section}.
\paragraph{Code generation}
The literals generate a \verb|loadI| instruction. For character literals, the \verb|loadI| is followed by \verb|i2c| to make sure the loaded character is within the character range of the ILOC VM.
Operators first generate their operands, locking the result register, then generate their operator instruction and free the registers.
The compound expression simply generates its inner expressions in order. See \cref{basic-codegen} for an example of generated code.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Generated code for basic expressions in Boppi.}
\label{basic-codegen}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.2\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
2017-07-21 09:38:24 +00:00
5;
'c';;;;
2018-02-18 18:15:37 +00:00
4+(2*3/-1);
2017-07-21 09:38:24 +00:00
{
true && false;
3
}+4;
2018-02-18 18:15:37 +00:00
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.7\textwidth}
\caption{Generated ILOC}
\begin{minted}{iloc}
loadI 5 => r_1 // 5
2018-02-18 18:15:37 +00:00
loadI 99 => r_1 // 'c'
i2c r_1 => r_1 // 'c'
2018-02-18 18:15:37 +00:00
loadI 4 => r_1 // 4
loadI 2 => r_2 // 2
loadI 3 => r_3 // 3
mult r_2,r_3 => r_2 // *
loadI 1 => r_3 // 1
rsubI r_3,0 => r_3 // unary -
div r_2,r_3 => r_2 // /
add r_1,r_2 => r_1 // +
2018-02-18 18:15:37 +00:00
loadI 1 => r_2 // true
loadI 0 => r_1 // false
and r_2,r_1 => r_2 // &&
2018-02-18 18:15:37 +00:00
loadI 3 => r_1 // 3
2018-02-18 18:15:37 +00:00
loadI 4 => r_2 // 4
add r_1,r_2 => r_1 // +
2018-02-18 18:15:37 +00:00
\end{minted}
\end{subfigure}
\end{figure}
\section{Variables}
\label{variables-section}
\paragraph{Syntax}
Variables introduce two more \verb|stat|ement types: declarations and assignments. A declaration introduces a variable name and restricts it to a certain type. The type can be either one of the three built-in types (boolean, character, integer) or a previously declared identifier. The optional \verb|CONST| keyword can be used to declare a variable constant, so it may only be assigned once. Secondly, an assignment sets the value of a variable to the result of an expression. Lastly, a variable can be used in an expression if it has been assigned. See \cref{variables-syntax} for the additional ANTLR parse tree elements.
\begin{figure}
\caption{ANTLR4 code for variables in Boppi.}
\label{variables-syntax}
\begin{minted}{antlr}
stat
: ...
| declareStat
| assignStat
;
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
declareStat
: ...
| DECLARE CONSTANT? type IDENTIFIER #declare
;
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
assignStat
: variable ASSIGN (assignStat | expr) #assign
;
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
expr
2017-07-21 09:38:24 +00:00
: ...
2018-02-18 18:15:37 +00:00
| variable #getVariable
2017-07-21 09:38:24 +00:00
;
type
2018-02-18 18:15:37 +00:00
: ...
| staticType=(INTTYPE | BOOLTYPE | CHARTYPE) #typeSimple
| variable #typeVariable
;
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
variable
: ...
| IDENTIFIER #variableSimple
;
\end{minted}
\end{figure}
2017-07-21 09:38:24 +00:00
\paragraph{Examples}
2018-02-18 18:15:37 +00:00
See \cref{variables-example} for some examples of variable usage. Again, input and output is introduced in \cref{io-section}, so the result of an expression is listed in the comments.
\begin{figure}
\caption{Example code with variables in Boppi.}
\label{variables-example}
\begin{minted}{boppi}
var int myInt; //declares integer myInt (result: void)
myInt := 4; //myInt is now 4 (result: 4)
var myInt otherInt; //declares otherInt with the same type myInt
otherInt := 4+(myInt := 2); //myInt is now 2 and otherInt is 6 (result: 6)
var bool aBool; //declares boolean aBool
2017-07-21 09:38:24 +00:00
aBool := {
2018-02-18 18:15:37 +00:00
var const int otherInt; //declares a constant integer otherInt inside this block
otherInt := 12; //otherInt is now 12 (result: 12)
myInt > otherInt //(result: false)
}; //aBool is now false
\end{minted}
\end{figure}
\paragraph{Use}
A variable must be declared before it can be used and its type must be given in the declaration. Moreover, a variable can only be used within an expression once it has been assigned a value. The result of an expression can only be assigned to a variable if the types match.\\
Regarding lexical scopes, a variable only exists in the scope in which it is declared and deeper scopes. A variable can be redeclared, effectively hiding the original variable, in a deeper scope, linking the name to a new variable. The type of the redeclared variable does not have to match the original type.
\paragraph{Semantics}
2018-03-01 22:53:49 +00:00
\verb|#declare| introduces a new variable to the current scope. It does not perform any action during runtime, but the identifier and its type are recorded and the variable can be assigned and used from this point onwards.\\
\verb|#assign| first evaluates the expression on the right hand side, then stores the result in the variable on the left hand and finally passes the result on.\\
\verb|#getVariable| retrieves the value of a variable.\\
2018-02-18 18:15:37 +00:00
\paragraph{Code generation}
2018-03-01 22:53:49 +00:00
Declarations do not generate any code.\\
A \verb|#variableSimple| calculates the address of a simple variable using \verb|addI r_arp,c_offset => r_address|, where the \verb|c_offset| is decided by the symbol table. Note that, with the addition of functions, see \cref{functions}, the \verb|r_arp| becomes a temporary variable \verb|i2i r_arp => r_temp|. This is followed by a \verb|loadAI r_temp,OFFSET_CALLER_ARP => r_temp| for each lexical scope between the current scope and where the variable is declared.\\
An assignment (\verb|#assign|) first visits the expression on the left hand and keeps track of it in a \verb|r_result| register. Then the variable address is retrieved as above. And finally it generates a \verb|store r_result => r_address| (\verb|cstore r_result => r_address| for characters) to store the result.\\
Likewise, a use of a variable (\verb|#getVariable|) first retrieves the address as above, and then loads the variable using \verb|load r_address => r_result| (\verb|cload r_address => r_result| for characters).\\
2018-02-18 18:15:37 +00:00
An example of (chained) assignments, uses and offsets can be seen in \cref{variables-code}. As can be seen in de generated ILOC code, the variables x, y, unused, b and c have the respective offsets of 0, 4, 8, 9 and 13.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Generated code for basic variable use in Boppi.}
\label{variables-code}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.2\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
var int x;
var int y;
x := 4;
y := 3+x;
var char unused;
var bool b;
var bool c;
c := b := x < y;
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.7\textwidth}
\caption{Generated ILOC}
\begin{minted}{iloc}
loadI 4 => r_1 // 4
addI r_arp,0 => r_2 // add offset
store r_1 => r_2 // to x
loadI 3 => r_1 // 3
addI r_arp,0 => r_2 // add offset
load r_2 => r_2 // load address
add r_1,r_2 => r_1 // +
addI r_arp,4 => r_2 // add offset
store r_1 => r_2 // to y
addI r_arp,0 => r_1 // add offset
load r_1 => r_1 // load address
addI r_arp,4 => r_2 // add offset
load r_2 => r_2 // load address
cmp_LT r_1,r_2 => r_1 // <
addI r_arp,9 => r_2 // add offset
store r_1 => r_2 // to b
addI r_arp,13 => r_2 // add offset
store r_1 => r_2 // to c
2018-02-18 18:15:37 +00:00
\end{minted}
\end{subfigure}
\end{figure}
\section{Input/Output}
\label{io-section}
\paragraph{Syntax}
I/O introduces two expression types: input and output. In an input expression, a sequence of variables is provided and in an output expression a sequence of expressions can be provided. The ANTLR rules can be seen in \cref{io-syntax}.
\begin{figure}
\caption{ANTLR4 code for I/O in Boppi.}
\label{io-syntax}
\begin{minted}{antlr}
singleExpr
: ...
| IN PAROPEN variable (LISTDELIM variable)* PARCLOSE #read
| OUT PAROPEN expr (LISTDELIM expr)* PARCLOSE #write
;
\end{minted}
\end{figure}
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
\paragraph{Examples}
See \cref{io-example} for the basic use of input and output expressions.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Example code for I/O in Boppi.}
\label{io-example}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.5\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
var int anInt;
read(anInt);
print('a');
print(read(anInt)+4);
var bool aBool;
aBool := true;
print(aBool);
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.4\textwidth}
\caption{input and output}
\begin{minted}{text}
> 4
<<< a
> 8
<<< 12
<<< true
\end{minted}
\end{subfigure}
\end{figure}
\paragraph{Use}
Input expressions can only contain simple variables as arguments. If there is exactly one variable present, the type of the variable and its value will be passed out of the expression. Otherwise the result of the expression is \verb|void|.\\
Output expressions can only contain non-void expressions as arguments. Analogous to the input expression, when one argument provided, the type and value is passed out of the expression and \verb|void| is returned otherwise.
\paragraph{Semantics}
The input expression stores a value in every variable argument by reading each value from the standard input. When exactly one variable is present, the result of the expression is that value, otherwise it is void.\\
The output expression prints the result of each expression to the standard output. Analogous to the input expression, when exactly one argument is given, this will be the result of the expression.\\
If a read or print action is undefined for a type, it will halt the machine with the status \emph{ERROR\_INPUT\_UNKNOWN\_TYPE} respectively \emph{ERROR\_OUTPUT\_UNKNOWN\_TYPE}.
\paragraph{Code generation}
When printing expressions, the generator evaluates each expression and then prints it to the standard output. For printing an integer, the generator simply produces \verb|out "", r| where \verb|r| is the register holding the value of the current expression. For printing a character, the character is pushed onto the stack as a string and then printed using \verb|cout|, see \cref{character-output}. For printing a boolean, the generator calls a subroutine to print either \emph{true} or \emph{false} to the standard output. The subroutine can be seen in \cref{boolean-output}. For more information about the subroutine calling convention, see \cref{memlib-calling}.\\
When reading values to variables, the generator first reads from the standard input and then stores the value similarly to the assign statement. In case of booleans and integers, the generator simply produces \verb|in "" => r|. In case of a character, the generator calls a subroutine for reading a line and extracting exactly one character. The subroutine can be seen in \cref{character-input}. It reads a whole line at a time as per the \verb|cin| instruction. Empty lines are discarded and the first character of a non-empty line is returned. For the calling convention, again see \cref{memlib-calling}.
\begin{figure}
\caption{ILOC for printing a single character stored in register r.}
\label{character-output}
\begin{minted}{iloc}
2018-02-18 18:15:37 +00:00
cpush r
loadI 1 => r_t
push r_t
cout ""
\end{minted}
\end{figure}
\begin{figure}
\caption{\emph{stdlib} ILOC for writing a boolean.}
\label{boolean-output}
\begin{minted}{iloc}
2018-02-18 18:15:37 +00:00
// write a boolean to output
// stack: [return address, bool] -> []
stdbout: pop => m_1 // get boolean
loadI 0 => m_2 // load zero-length string
push m_2
cbr m_1 -> sbout_t,sbout_f
sbout_t: cout "true"
jumpI -> sbout_e
sbout_f: cout "false"
sbout_e: pop => m_1 // load return address
jump -> m_1
\end{minted}
\end{figure}
\begin{figure}
\caption{\emph{stdlib} ILOC for reading a single character.}
\label{character-input}
\begin{minted}{iloc}
2018-02-18 18:15:37 +00:00
// read a character from input
// stack: [return address] -> [char]
stdcin: cin "" // get line
pop => m_1 // get length
cbr m_1 -> scin_t,stdcin // repeat until at least one character
scin_t: cpop => m_2 // save character
scin_lc: subI m_1, 1 => m_1 // decrement char count
cbr m_1 -> scin_ll,scin_le
scin_ll: cpop => m_0 // discard character
jumpI -> scin_lc // repeat
scin_le: loadI 0 => m_0 // reset zero register
pop => m_1 // get return address
cpush m_2 // push result character
jump -> m_1
\end{minted}
\end{figure}
\section{Conditional code}
\label{conditionals}
\paragraph{Syntax}
Conditionals extend the Boppi language with two expression types. The \verb|#if| expression has an optional \verb|ELSE| clause. The extra syntax can be seen in \cref{conditional-syntax}.
\begin{figure}
\caption{ANTLR4 code for conditionals in Boppi.}
\label{conditional-syntax}
\begin{minted}{antlr}
expr
2017-07-21 09:38:24 +00:00
: ...
2018-02-18 18:15:37 +00:00
| IFOPEN cond=stats IFTRUE onTrue=stats (IFFALSE onFalse=stats)? IFCLOSE #if
| WHILEOPEN cond=stats WHILETRUE onTrue=stats WHILECLOSE #while
\end{minted}
\end{figure}
2017-07-21 09:38:24 +00:00
\paragraph{Examples}
2018-02-18 18:15:37 +00:00
A few examples of \emph{if} and \emph{while} expressions can be seen in \cref{conditional-example}. The first \emph{if} construction shows how it can be used as an expression with a result, in this case either the character \verb|T| or \verb|F|. Next is an \emph{if} construction with only a consequent and no alternative. Lastly two \emph{while} constructions are presented to show the use of scoped variables and looping.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Example of conditionals in Boppi.}
\label{conditional-example}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.6\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
var int x;
print(if read(x) > 4 then 'T' else 'F' fi);
if x == 8 then
print('H','i')
fi;
while var bool cont; read(cont) do
var int y;
y := x;
x := print(y+x);
od;
var int i;
var int n;
i := 1;
n := 0;
while i < x do
n := n+i;
i := i+1;
od;
print(n);
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.3\textwidth}
\caption{input and output}
\begin{minted}{text}
> 8
<<< T
<<< H
<<< i
> 1
<<< 16
> 1
<<< 32
> 0
<<< 496
\end{minted}
\end{subfigure}
\end{figure}
\paragraph{Use}
The \emph{if} expression can freely be used with and without an alternative (\verb|ELSE|) and the result types of the consequent and the alternative can be of any type. The condition has to result in a boolean type and the whole expression will generally return \emph{void}. However, when the consequent and alternative result in the same type, the \emph{if} expression will have this return type. Then, when executed, the expression will return the result of the branch taken.\\
The \emph{while} expression has one form that also requires the condition to have a boolean type and allows the body to have any type. The expression will always return \verb|void|.
\paragraph{Semantics}
The \emph{if} expression first executes the condition and executes the consequent only if the condition is \verb|true|. If the condition is \emph{false} and there is an alternative, the alternative will be executed.\\
The \emph{while} expression executes the condition and executes the body if the condition is \verb|true|. Also, if the condition is \verb|true|, it will then repeat the while expression.
\paragraph{Code generation}
The \emph{if} statement generates two or three jump targets, depending on whether an alternative clause is present. In both cases the condition is visited followed by a \verb|cbr| instruction based on the value of the condition.\\
When no alternative is present, the \verb|cbr| jumps to either the \verb|if_t| or \verb|if_e| target. A \verb|if_t: nop| is produced immediately afterwards, after which the consequent is visited. Lastly a \verb|if_e: nop| is produced. Effectively the consequent is skipped when the condition is false.\\
When an alternative is present, the \verb|cbr| jumps to either the \verb|if_t| or \verb|if_f| target, after which a \verb|if_t: nop| is produced and the consequent is visited. If the expression is to return a value, \verb|i2i r1 => r2| a \verb|jumpI -> if_e| is produced. Next, for the alternative, a \verb|if_f: nop| is produced and the alternative is visited. \cref{if-code} is an example of an \emph{if} expression with three targets and a return value.\\
The \emph{while} loop has three jump targets, one for evaluating the condition, one for executing the body of the loop and one for breaking out of the loop. The compiler first produces a \verb|jumpI -> while_f| to jump to the condition. Then a \verb|while_t: nop| is produced as a jump target followed by the loop body. Next, a jump target for the condition, \verb|while_f: nop|, and the condition and \verb|cbr r_k -> while_t,while_e| are produced, where \verb|r_k| is the register that holds the result of the condition. Finally, the breaking jump target \verb|while_e: nop| is produced. See \cref{while-code} for an example.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Generated code for an if expression in Boppi.}
\label{if-code}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.2\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
if 2 > 1 then
'T'
else
'F'
fi
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.7\textwidth}
\caption{Generated ILOC}
\begin{minted}{iloc}
loadI 2 => r_1 // 2
loadI 1 => r_2 // 1
cmp_GT r_1,r_2 => r_1 // >
cbr r_1 -> if_t0,if_f1 //
2018-02-18 18:15:37 +00:00
if_t0: nop //
loadI 84 => r_2 // 'T'
i2c r_2 => r_2 // 'T'
i2i r_2 => r_1 // result
2018-02-18 18:15:37 +00:00
jumpI -> if_e2 //
if_f1: nop //
loadI 70 => r_2 // 'F'
i2c r_2 => r_2 // 'F'
i2i r_2 => r_1 // result
2018-02-18 18:15:37 +00:00
\end{minted}
\end{subfigure}
\end{figure}
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Generated code for a while expression in Boppi.}
\label{while-code}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.2\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
while true do
1
od
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.7\textwidth}
\caption{Generated ILOC}
\begin{minted}{iloc}
2018-02-18 18:15:37 +00:00
jumpI -> while_f1 // to condition
while_t0: nop // loop target
loadI 1 => r_1 // 1
2018-02-18 18:15:37 +00:00
while_f1: nop // condition target
loadI 1 => r_1 // true
cbr r_1 -> while_t0,while_e2 //
2018-02-18 18:15:37 +00:00
while_e2: nop // end target
\end{minted}
\end{subfigure}
\end{figure}
\section{Functions}
\label{functions}
\paragraph{Syntax}
Functions introduce one new mode of declaration and one new mode of expression. The declaration of a function takes a name, an optional sequence of parameters and an optional return value. These parameters each have a type and a name. A function call requires a variable (the name of the function) followed by a sequence of expressions between parentheses. Moreover, the feature introduces the arrow and tuple at the type level, so function types can be constructed. The arrow denotes a function from the (tuple) type on the left to the type on the right, whereas the tuple is a sequence of types. The ANTLR rules can be seen in \cref{functions-syntax}.
\begin{figure}
\caption{ANTLR4 code for functions in Boppi.}
\label{functions-syntax}
\begin{minted}{antlr}
declareStat
: ...
| FUNCTION (result=type)? name=IDENTIFIER PAROPEN parameters? PARCLOSE body=expr #declareFunction
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
expr
: ...
| variable PAROPEN (expr (LISTDELIM expr)*)? PARCLOSE #call
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
parameters
: ...
| type IDENTIFIER (LISTDELIM type IDENTIFIER)*
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
type
: ...
| type ARROW type #typeFunction
| PAROPEN (type (LISTDELIM type)*)? PARCLOSE #typeTuple
\end{minted}
\end{figure}
2017-07-21 09:38:24 +00:00
\paragraph{Examples}
2018-02-18 18:15:37 +00:00
Functions can be used in many ways in Boppi. They can have any number of arguments and a return value is optional. Since Boppi allows for side effects, functions without a return value have their use. One example is shown in \cref{functions-example}: \verb|logPrint| counts the number of times the function has been called by incrementing the non-local variable \verb|numCalls|. A function with a return value can used in an expression, as can be seen in the same example with the function \verb|add|.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Example of functions in Boppi.}
\label{functions-example}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.6\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
var int numCalls;
numCalls := 0;
function logPrint(int n) {
numCalls := numCalls+1;
print(n);
};
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
logPrint(1);
logPrint(2);
print(numCalls);
function int add(int a, int b) a+b;
print(10*add(3, 4));
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.3\textwidth}
\caption{input and output}
\begin{minted}{text}
<<< 1
<<< 2
<<< 2
<<< 70
\end{minted}
\end{subfigure}
\end{figure}
\paragraph{Use}
The function declaration has an optional return type, a name, a sequence of parameters in parentheses and an expression that is the body of the function. The sequence of parameters may be empty and each parameter consists of a type and a name. After the declaration, the function variable is marked \emph{constant} and \emph{assigned}. Since the function declaration enters a new scope, the parameter names may override variable names outside the function. Inside the body of the function, the parameters will have a value \emph{assigned}. If the function has a return type, the function body must return this type. Because the function declaration as a whole is a declare statement, it has no result type and must not be the last statement of a block.\\
The function call consists of a name followed by a sequence of expressions in parentheses. The function name must match a variable that has a function type. The number of expressions must match the number of formal parameters of the function. Each expression must be of the same type as the corresponding formal parameter. If the function has a return type, the function call results in this type, otherwise the result type is \verb|void|.\\
A function type must always have an arrow at the top level and a tuple on its left side. A function variable is not initialized at declaration and there is no run-time check during a function call, so the user should heed any warning that a variable may not be assigned. Lastly, there is currently no way to construct a function type with no return value, so variables and parameters can only have these types by copying the type of another function.\\
\paragraph{Semantics}
A function declaration creates a variable in the current scope with the name of the function. The type is constructed as a tuple of the parameter types and the return type. The return type will \verb|void| if not present. The function variable is marked assigned. The body of the function is not evaluated.\\
A function call first retrieves the function reference and constructs an activation record (\emph{AR}). Then each of the parameters is evaluated and their results are stored within the AR. Finally, the program jumps to the body of the function.\\
The body of a function declaration links all the formal parameters to places within the AR, which are assigned a value by the function call. During a call, first the body of the function is evaluated. Next, if there is a result, it will be stored in the AR. Then, the AR will be dereferenced and, if it is marked for deletion, all local reference variables will be dereferenced. Finally, the control flow is moved back to where the function call was made.\\
At the end of a function call, the result is loaded, if there is any.
\paragraph{Code generation}
In a function declaration, the generator performs the following steps:
\begin{enumerate}
\item Jump over the function body.
\item Visit the inner expression.
\item If the function is to return a value, generate a \verb|storeAI r_res => r_arp, OFFSET_RETURN|, with \verb|r_res| the register that holds the result of the inner expression.
\item Check whether there is only one reference left to the current AR. If so, free (decrement the reference count of) all local variables where applicable.
\item Generate a \verb|loadAI r_arp, OFFSET_RETURN_ADDRESS => r_temp| and \verb|jump -> r_temp| to return to the call site.
\item Allocate a tuple for the target address, current AR and desired AR size.
\item Store the relevant values in that tuple, with the target address and AR size decided at compile-time, while the current AR is decided at run-time.
\item Store the address of the tuple in the function variable's offset.
\item Increment the reference count of the current AR and its parents.
\end{enumerate}
2018-03-01 22:53:49 +00:00
The tuple is required for two reasons. Firstly, the current AR is relevant in case the function uses non-local variables and is used outside the current function. Secondly, the AR size is relevant due to the way ARs are constructed, see \cref{problems-reassigned-closures}. The tuple is referenced rather than stored in place because it is easier to move around a single integer.\\
The references to the AR's parents are incremented as one solution to implementing closures, see \cref{problem-closures}.\\
2018-02-18 18:15:37 +00:00
Freeing local reference variables at the end of the function is done as a simple mechanism to clean up memory. A better approach would be to generate cleanup procedures, as discussed in \cref{future-work}.\\
In a function call, the generator performs a number of steps:
\begin{enumerate}
\item Retrieve the function variable.
\item Load the function tuple using \verb|load r_var => r_tuple| with \verb|r_var| the register holding the address of the function variable.
\item Allocate the AR using the size specified in the tuple, \verb|loadAI r_tuple, OFFSET_AR_SIZE => r_narp|.
\item Shift the pointer to the callee AR by \verb|addI r_narp, AR_BASE_SIZE => r_narp| so AR properties have negative offsets and local variables (including parameters) start at an offset of 0.
\item Evaluate each parameter of the function and copy the result to the callee AR using \verb|storeAI r_res => r_narp, c_offset|, with \verb|c_offset| the offset of the formal parameter.
\item Save all registers that are in use by pushing them to the stack.
\item Save the current AR in the callee AR, \verb|storeAI r_arp => r_narp, OFFSET_CALLER_ARP|, to retrieve it after the call.
\item Copy the parent AR that belongs to the function tuple to the callee AR.
\item Increment the reference count to the parent ARs of the callee.
\item Save the return address to the callee AR.
\item Switch to the callee AR.
\item Jump to the call site pointed to by the function tuple.
\item Decrement the reference count to the callee AR and its parents.
\item Restore all the registers that were in use by popping them from the stack.
\item Load the result of the function, if any.
\item Restore the AR.
\end{enumerate}
Note there is no particular reason why the registers are saved to the stack, while the caller's ARP and return address are saved to the callee's AR.\\
For a simple example of a function declaration and call, see \cref{functions-code}. Because the generated ILOC spans more than 100 lines, it can be found as a separate file.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Generated code for functions in Boppi.}
\label{functions-code}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.5\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
function int successor(int n)
n+1;
var int x;
read(x);
print(successor(x));
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.5\textwidth}
\caption{Generated ILOC}
See \emph{doc/successor.iloc.txt}. Instructions 178-193 form the body of \verb|successor| with 180-183 generated by the expression in the body, 194-217 form the allocation of \verb|successor| as a variable, 222-270 form a call to \verb|successor| and 272-293 form the deallocation of \verb|successor|.
\end{subfigure}
\end{figure}
\section{Arrays}
\label{arrays}
\paragraph{Syntax}
Arrays add new syntax in three places in the language. It introduces a way to construct an array type of any type and two ways two construct an array. The first way to construct an array is providing an array literal: \verb|[ element1, element2, ... ]|. The second way is to provide the element type, the number of elements and an offset: \verb|array( type, length, offset )| where length and offset can be any integer expression. The choice was made to require the element type, because it allows the type checking to only use a synthesized attribute. For the same reason, an array literal must contain at least one item. Lastly, arrays introduce two variable constructions: the array element accessor and the property accessor. The ANTLR rules can be seen in \cref{arrays-syntax}.\\
The way arrays are declared and defined is contrary to the project assignment. While arrays were defined as fixed-size vectors in a previous iteration of the language, this was considered too restrictive in practice.
2018-02-18 18:15:37 +00:00
\begin{figure}
\caption{ANTLR4 code for arrays in Boppi.}
\label{arrays-syntax}
\begin{minted}{antlr}
expr
: ...
| ARRAY PAROPEN type LISTDELIM size=expr LISTDELIM offset=expr PARCLOSE
2018-02-18 18:15:37 +00:00
| ARROPEN expr (LISTDELIM expr)* ARRCLOSE #literalArray
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
type
: ...
| type ARROPEN ARRCLOSE #typeArray
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
variable
: ...
| variable ARROPEN expr ARRCLOSE #variableArray
| variable PROP IDENTIFIER #variableProperty
\end{minted}
\end{figure}
2017-07-21 09:38:24 +00:00
2018-02-18 18:15:37 +00:00
\paragraph{Examples}
An example of using arrays can be seen in \cref{arrays-example}.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Example of arrays in Boppi.}
\label{arrays-example}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.6\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
var int[] fibs; fibs := [1,1,2,3,5,8];
var int i; read(i);
while i < fibs.length do
print(fibs[i]);
i := i+1;
od
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.3\textwidth}
\caption{input and output}
\begin{minted}{text}
> 2
<<< 2
<<< 3
<<< 5
<<< 8
\end{minted}
\end{subfigure}
\end{figure}
\paragraph{Use}
Array variables are not assigned at declaration. As such, the user should heed a warning that a variable may not be assigned, since their value may point anywhere. Moreover, an array may contain undefined elements, which neither the compiler nor the run-time will detect.\\
Array variables have exactly two named properties, their \verb|length| and \verb|offset|. The length is always non-negative for assigned arrays and undefined otherwise. The offset is always zero for an array literal and undefined for an unassigned array. All other types up to here have no properties.\\
An array literal may contain any positive number of elements, which must all have the same type. The type of the resulting array is, naturally, an array of the elements' type. Its length is equal to the number of expressions in the literal and its offset is zero.\\
An array constructor comprises a type, which will be the type of the elements, a non-negative number of items and an offset. Note that the elements will be undefined.\\
2018-02-18 18:15:37 +00:00
An array accessor may only be used on an array type. The result type will be the element type of the array.\\
Arrays can be compared with each other for equality if they have the same element type.\\
Lastly, an array of characters can be printed to standard output and can be read from standard input.\\
Note that, while an array variable may be defined constant, its elements can still be changed.
\paragraph{Semantics}
An array is a finite sequence of items of a single type whose values can be retrieved through an index. An array literal creates an array exactly large enough to hold all the expressions inside, then evaluates the expressions left-to-right and puts each result in the respective array index. An array constructor simply evaluates the requested length and offset and allocates an array of that length, or halts the machine if the length is negative.\\
Assigning an array to a variable means that variable will point to the array from that moment. This means an expression like \verb|array1 := array2;| will result in both variables pointing to the same array, so changing an element of \verb|array1| will change the element for \verb|array2|.\\
2018-02-18 18:15:37 +00:00
An array accessor evaluates the index expression and then returns the element at that index, or halts the machine if that index is out of bounds.\\
An equality check between two arrays compares the length and each element of the array. Note that, for nested arrays, this will compare the addresses of the inner arrays, rather than the length and values within those arrays.
\paragraph{Code generation}
An array is stored as a contiguous block of data. It comprises a header of two integers (length and offset) followed by a body containing the array elements.\\
The array constructor (\verb|array(type, length, offset)|) first evaluates the expression, then generates a check whether the array size is valid and either \verb|halt|s the machine or allocates the array. After allocating the array, the offset is evaluated and both the length and offset are stored in the array.\\
An array literal (\verb|[element,element,..]|) generates the allocation of the array, then, for each expression, evaluates it and puts the result in the array using \verb|storeAI r_res => r_array, c_offset|. Here, \verb|r_res| is the result of the expression, \verb|r_array| the base address of the array and \verb|c_offset| the offset of the particular element calculated at compile-time. This offset starts at \verb|2*INT_SIZE| to accommodate for the array headers, and is incremented with the element size.\\
An array access (\verb|arr[index]|) generates the following steps (illustrated in \cref{arrays-access-snippet}):
2018-02-18 18:15:37 +00:00
\begin{enumerate}
\item Visit the array variable
\item Load the array's address
\item Visit the index expression
\item Load the array's offset and subtract it from the calculated index
\item Load the array's length and check whether the new index is between zero (inclusive) and the length (exclusive). Halt if this is not the case.
\item Multiply the new index by the element size to get the offset and add it to the array's body address to get the address of the element.
2018-02-18 18:15:37 +00:00
\end{enumerate}
Retrieving the length and offset of an array is achieved by first retrieving the array's addres and then adding the respective offset of the length (\verb|OFFSET_ARRAY_LENGTH|) or index offset (\verb|OFFSET_ARRAY_OFFSET|).
2018-02-18 18:15:37 +00:00
\begin{figure}
\caption{Array access snippet from \cref{arrays-code}. The array reference is stored at \texttt{r\_arp,0} and the requested index is \texttt{7}.}
2018-02-18 18:15:37 +00:00
\label{arrays-access-snippet}
\begin{minted}{iloc}
addI r_arp,0 => r_2 // add offset
load r_2 => r_2 // get array object
loadI 7 => r_3 // 7
loadAI r_2,4 => r_1 // load array offset
sub r_3,r_1 => r_3 // subtract array offset
loadAI r_2,0 => r_1 // load array length
cmp_LT r_3,r_1 => r_1 // check array index
cmp_GE r_3,r_nul => r_4 // check array index
and r_1,r_4 => r_4 // check array index
cbr r_4 -> nob5,oob4 // check array index
oob4: haltI 1634692962 // array index out of bounds
nob5: multI r_3,4 => r_3 // multiply index by size
addI r_3,8 => r_3 // point to array body
add r_2,r_3 => r_2 // get array index address
2018-02-18 18:15:37 +00:00
\end{minted}
\end{figure}
See \cref{arrays-code} for a simple example of a nested array. Because the generated ILOC spans around 100 lines, it can be found as a separate file.
\begin{figure}
2018-03-01 22:53:49 +00:00
\caption{Generated code for arrays in Boppi.}
\label{arrays-code}
2018-02-18 18:15:37 +00:00
\begin{subfigure}{0.2\textwidth}
\caption{Boppi code}
\begin{minted}{boppi}
var int[][] matrix;
matrix := [
[1,2],
[3,4]
];
print(matrix[0][1]);
\end{minted}
\end{subfigure}
\hfill
\begin{subfigure}{0.7\textwidth}
\caption{Generated ILOC}
See \emph{doc/nestedArray.iloc.txt}. Lines 178-183 allocate the top-level array, whereas 184-189 and 196-200 allocate the second-level arrays. Lines 206-215 try to decrement the reference to the current array of \verb|matrix| if assigned, which it isn't. Lines 217-229 increment and decrement the reference count of the outer array because of the assignment expression and the statement ending. Lines 230-253 retrieve the matrix's element (0,1), which is the two.
\end{subfigure}
\end{figure}
\section{Reference types}
This is an addendum to function variables and arrays. Both of these are reference types, meaning that the values of the variables are merely pointers to a heap-allocated segment of data. In order to automatically garbage collect the objects, the compiler needs to change the reference count (\emph{RC}) whenever a reference type is used.\\
When a reference type is used in an expression, its RC is incremented. When the result of an expression is discarded, e.g. at the end of a statement or in an output expression with multiple arguments, a reference type will have its RC decremented. In an assignment with a reference type, first the expression is evaluated, then the RC of the old value of the variable is decremented and finally the RC of the new value is incremented.