\documentclass[../main.tex]{subfiles} \begin{document} \section{Idris 2 Compiler for Artyst}% \label{sec:compiler} To aid the development of the System~T reducer and encoding algorithms, we have implemented a type checker and compiler for \lang{}. The compiler is written in Idris~2~\cite{idris}, a dependently typed programming language. The compiler is split into three main components: \begin{itemize} \item First we have a parser based on \textcite{alg-parsing}. The parser interface guarantees that we process the input without backtracking. This produces an intrinsically-well-scoped abstract syntax tree. \item Second we perform type checking using a bidirectional algorithm~\cite{bidir}. Along with the usual type checking result, the algorithm produces a runtime-erased typing derivation. \item Finally we compile the term to Guile~Scheme~\cite{guile}. By requiring an erased type derivation, the compiler can statically guarantee it will only ever have to process well-typed terms. \end{itemize} We present a simplified form of the Idris types used in the compiler. \subsection{Parser} The \lang{} parser is derived from \textcite{bidir}. At its core is a data type for parser combinators.\@ \textcite{bidir} gave a type system for the combinators such that well-formed combinators can parse an input: in linear time; without backtracking; and with only a single token of lookahead. They implemented both an interpreter and a staged compiler for expressions. The interpreter only reports errors about the combinators at run time. We adapted the combinator interpreter for Idris 2. First, we used dependant types to statically enforce a combinator is well-formed. The ``parse'' interpretation function requires a certificate that a parser is well-formed. Second, our interpreter produces a certificate that the list of unprocessed tokens is a suffix of the input tokens. This certificate is required to satisfy the Idris termination checker when interpreting fixed point combinators. Finally, we thread state throughout the combinators. It is common for parsers to rely on some additional state~\cite{states}. For example \lang{} keeps an environment of bound names as state. As Idris has dependant types, we can make the result of parsing a combinator depend on the initial state. A parser combinator can interact with state when: \begin{itemize} \item processing a sequence, where it can update the state between items; or \item performing a semantic action (modifying the parse result), where it can read the state. \end{itemize} For example, consider parsing the syntax \verb|\x => x|, binding the variable \verb|x| before referencing it. After processing the first occurrence of \verb|x|, the \lang{} parser updates the state adding \verb|x| to the environment of bound names. After parsing the second \verb|x| as an identifier, the parser can look up the name in the environment and convert it into a deBruijn index. The state is not stored globally, but is local to the current parser combinator. We can imagine this as a stack of states. Before parsing the first item in a sequence, we push a copy of the current state on top of the stack. After parsing that item, we pop the local state off the stack. We can then modify the original state using the first item's value. We then copy this modified state onto the stack before parsing the second item, and so on. The parser returns an intrinsically well-scoped term~\cite{iwss}. Terms are a dependent data type with signature \verb|Context -> Type|. All variable names within a term are guaranteed to be bound within the context parameter or a term constructor such as \letkw{} or \foldkw{}. The term is \emph{intrinsically} well-scoped because the binding rules are enforced by the term constructors; an extrinsically well-scoped term would use a separate predicate to prove raw syntax is well-scoped. Normally parsers produce ``raw'' abstract syntax trees with concrete names instead of deBruijn indices. We use dependant state to directly create the well-scoped AST because starting from a well-scoped AST simplifies the type checker. We also had the option to produce a raw AST and an extrinsic certificate it is well-scoped. Extrinsic certificates are best for certifying properties of preexisting structures. As our parser is creating the well-scoped AST, the raw AST cannot have been used in any prior computations, so we do not need to reference the raw AST. \subsection{Bidirectional Type Checker} \begin{figure} \[ \begin{array}{cccc} \begin{prooftree} \hypo{x : A \in \Gamma} \infer1{\judgement{\Gamma}{x}[\Rightarrow]{A}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma, x : A}{t}[\Leftarrow]{B}} \infer1{\judgement{\Gamma}{\lambda x.t}[\Leftarrow]{A \to B}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{f}[\Rightarrow]{A \to B}} \hypo{\judgement{\Gamma}{t}[\Leftarrow]{A}} \infer2{\judgement{\Gamma}{f~t}[\Rightarrow]{B}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{A}} \hypo{\judgement{\Gamma, x : A}{u}[\Leftrightarrow]{B}} \infer2{\judgement{\Gamma}{\lettm{t}{x}{u}}[\Leftrightarrow]{B}} \end{prooftree} \\\\ \begin{prooftree} \hypo{\rangeover{\judgement{\Gamma}{t_i}[\Leftrightarrow]{A_i}}{i}} \infer1{\judgement{\Gamma}{\tuple{\rangeover{t_i}{i}}}[\Leftrightarrow]{\prod_i A_i}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{\prod_i A_i}} \infer1{\judgement{\Gamma}{t.i}[\Rightarrow]{A_i}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Leftarrow]{A_i}} \infer1{\judgement{\Gamma}{\tuple{i, t}}[\Leftarrow]{\sum_i A_i}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{\sum_i A_i}} \hypo{\rangeover{\judgement{\Gamma, x_i : A_i}{t_i}[\Leftarrow]{B}}{i}} \infer2{\judgement{\Gamma}{\casetm{t}{\tuple{i,x_i}}{t_i}{i}}[\Leftarrow]{B}} \end{prooftree} \\\\ \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{A}} \hypo{A = B} \infer2{\judgement{\Gamma}{t}[\Leftarrow]{B}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Leftarrow]{A}} \infer1{\judgement{\Gamma}{t : A}[\Rightarrow]{A}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Leftarrow]{\sub{A}{X/\mu X. A}}} \infer1{\judgement{\Gamma}{\roll~t}[\Leftarrow]{\mu X. A}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{\mu X. A}} \hypo{\judgement{\Gamma, x : \sub{A}{X/B}}{u}[\Leftarrow]{B}} \infer2{\judgement{\Gamma}{\foldtm{t}{x}{u}}[\Leftarrow]{B}} \end{prooftree} \end{array} \] \caption{Bidirectional typing rules for \lang{}.\@ We say \(t\) synthesises type \(A\) when \(\judgement{\Gamma}{t}[\Rightarrow]{A}\) and \(t\) checks at type \(A\) when \(\judgement{\Gamma}{t}[\Leftarrow]{A}\). We use \(\Leftrightarrow\) when a term can both synthesise and check at a type. All arrows must resolve in the same direction within the rule.}\label{fig:bidir} \end{figure} \TODO{ \begin{itemize} \item describe how type checking produces a derivation \item explain the derivation is runtime erased \item justify extrinsic derivation \item justify the derivation having no runtime cost \item describe why knowing there is a derivation is necessary \item explain why these extra features do not guarantee compiler correctness \end{itemize} } \subsection{Scheme Backend} \TODO{ \begin{itemize} \item justify how dynamic types make Scheme a good compilation target \item describe how products are compiled to assoc-lists \item describe how sums are compiled to tagged values \item describe how dynamic typing removes the need for roll \item explain why fold needs to know the term has a valid type \item explain why fold must be compiled recursively with map \end{itemize} } \end{document}