\documentclass[../main.tex]{subfiles} \begin{document} \section{Idris 2 Compiler for Artyst}% \label{sec:compiler} To aid the development of the System~T reducer and encoding algorithms, we have implemented a type checker and compiler for \lang{}. The compiler is written in Idris~2~\cite{idris}, a dependently typed programming language. The compiler is split into three main components: \begin{itemize} \item First we have a parser based on \textcite{alg-parsing}. The parser interface guarantees that we process the input without backtracking. This produces an intrinsically-well-scoped abstract syntax tree. \item Second we perform type checking using a bidirectional algorithm~\cite{bidir}. Along with the usual type checking result, the algorithm produces a runtime-erased typing derivation. \item Finally we compile the term to Guile~Scheme~\cite{guile}. By requiring an erased type derivation, the compiler can statically guarantee it will only ever have to process well-typed terms. \end{itemize} We present a simplified form of the Idris types used in the compiler. \subsection{Parser} The \lang{} parser is derived from \textcite{bidir}. At its core is a data type for parser combinators.\@ \textcite{bidir} gave a type system for the combinators such that well-formed combinators can parse an input: in linear time; without backtracking; and with only a single token of lookahead. They implemented both an interpreter and a staged compiler for expressions. The interpreter only reports errors about the combinators at run time. We adapted the combinator interpreter for Idris 2. First, we used dependant types to statically enforce a combinator is well-formed. The ``parse'' interpretation function requires a certificate that a parser is well-formed. Second, our interpreter produces a certificate that the list of unprocessed tokens is a suffix of the input tokens. This certificate is required to satisfy the Idris termination checker when interpreting fixed point combinators. Finally, we thread state throughout the combinators. It is common for parsers to rely on some additional state~\cite{states}. For example \lang{} keeps an environment of bound names as state. As Idris has dependant types, we can make the result of parsing a combinator depend on the initial state. A parser combinator can interact with state when: \begin{itemize} \item processing a sequence, where it can update the state between items; or \item performing a semantic action (modifying the parse result), where it can read the state. \end{itemize} For example, consider parsing the syntax \verb|\x => x|, binding the variable \verb|x| before referencing it. After processing the first occurrence of \verb|x|, the \lang{} parser updates the state adding \verb|x| to the environment of bound names. After parsing the second \verb|x| as an identifier, the parser can look up the name in the environment and convert it into a deBruijn index. The state is not stored globally, but is local to the current parser combinator. We can imagine this as a stack of states. Before parsing the first item in a sequence, we push a copy of the current state on top of the stack. After parsing that item, we pop the local state off the stack. We can then modify the original state using the first item's value. We then copy this modified state onto the stack before parsing the second item, and so on. The parser returns an intrinsically well-scoped term~\cite{iwss}. Terms are a dependent data type with signature \verb|Context -> Type|. All variable names within a term are guaranteed to be bound within the context parameter or a term constructor such as \letkw{} or \foldkw{}. The term is \emph{intrinsically} well-scoped because the binding rules are enforced by the term constructors; an extrinsically well-scoped term would use a separate predicate to prove raw syntax is well-scoped. Normally parsers produce ``raw'' abstract syntax trees with concrete names instead of deBruijn indices. We use dependant state to directly create the well-scoped AST because starting from a well-scoped AST simplifies the type checker. We also had the option to produce a raw AST and an extrinsic certificate it is well-scoped. Extrinsic certificates are best for certifying properties of preexisting structures. As our parser is creating the well-scoped AST, the raw AST cannot have been used in any prior computations, so we do not need to reference the raw AST. \subsection{Bidirectional Type Checker} \begin{figure} \[ \begin{array}{cccc} \begin{prooftree} \hypo{x : A \in \Gamma} \infer1{\judgement{\Gamma}{x}[\Rightarrow]{A}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma, x : A}{t}[\Leftarrow]{B}} \infer1{\judgement{\Gamma}{\lambda x.t}[\Leftarrow]{A \to B}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{f}[\Rightarrow]{A \to B}} \hypo{\judgement{\Gamma}{t}[\Leftarrow]{A}} \infer2{\judgement{\Gamma}{f~t}[\Rightarrow]{B}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{A}} \hypo{\judgement{\Gamma, x : A}{u}[\Leftrightarrow]{B}} \infer2{\judgement{\Gamma}{\lettm{t}{x}{u}}[\Leftrightarrow]{B}} \end{prooftree} \\\\ \begin{prooftree} \hypo{\rangeover{\judgement{\Gamma}{t_i}[\Leftrightarrow]{A_i}}{i}} \infer1{\judgement{\Gamma}{\tuple{\rangeover{t_i}{i}}}[\Leftrightarrow]{\prod_i A_i}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{\prod_i A_i}} \infer1{\judgement{\Gamma}{t.i}[\Rightarrow]{A_i}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Leftarrow]{A_i}} \infer1{\judgement{\Gamma}{\tuple{i, t}}[\Leftarrow]{\sum_i A_i}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{\sum_i A_i}} \hypo{\rangeover{\judgement{\Gamma, x_i : A_i}{t_i}[\Leftarrow]{B}}{i}} \infer2{\judgement{\Gamma}{\casetm{t}{\tuple{i,x_i}}{t_i}{i}}[\Leftarrow]{B}} \end{prooftree} \\\\ \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{A}} \hypo{A = B} \infer2{\judgement{\Gamma}{t}[\Leftarrow]{B}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Leftarrow]{A}} \infer1{\judgement{\Gamma}{t : A}[\Rightarrow]{A}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Leftarrow]{\sub{A}{X/\mu X. A}}} \infer1{\judgement{\Gamma}{\roll~t}[\Leftarrow]{\mu X. A}} \end{prooftree} & \begin{prooftree} \hypo{\judgement{\Gamma}{t}[\Rightarrow]{\mu X. A}} \hypo{\judgement{\Gamma, x : \sub{A}{X/B}}{u}[\Leftarrow]{B}} \infer2{\judgement{\Gamma}{\foldtm{t}{x}{u}}[\Leftarrow]{B}} \end{prooftree} \end{array} \] \caption{Bidirectional typing rules for \lang{}.\@ We say \(t\) synthesises type \(A\) when \(\judgement{\Gamma}{t}[\Rightarrow]{A}\) and \(t\) checks at type \(A\) when \(\judgement{\Gamma}{t}[\Leftarrow]{A}\). We use \(\Leftrightarrow\) when a term can both synthesise and check at a type. All arrows must resolve in the same direction within the rule.}\label{fig:bidir} \end{figure} \Cref{fig:bidir} shows how we have modified the \lang{} typing rules for bidirectional type checking. The changes are standard~\cref{bidir}. Products can both check and synthesise types. The \roll{} operator can only be checked as there is no unique correct guess for the shape type. The body of a \foldkw{} must be checked. The induction hypothesis is added to the context, so its type must be known before processing the body. As the induction hypothesis is filled by the motive, the only mode-correct way to obtain the motive type is for \foldkw{} to be checked. The Idris~2 type for certifying the bidirectional judgement is almost a direct translation of the judgement in \cref{fig:bidir}. The differences are down to products and sums allowing permutations of their labels. Whilst on paper we can represent these \(n\)-ary types as sets, in Idris they are realised by lists. This discrepancy manifests in \lang{}'s type checker in three ways: \begin{itemize} \item when switching from synthesis to checking, the synthesised and expected types are compared up to reordering, instead of equality; \item for case splits, we check there is a permutation between variants and branches; and \item for checking tuples, we check every label in the type has a unique component in the tuple. \end{itemize} \TODO{describe how type checking produces a derivation} For error reporting we introduced Idris types for negative certifications. Values of these types indicate how the bidirectional algorithm failed. Consider synthesising the type of an application. A negative certificate has three possibilities: \begin{itemize} \item the function does not synthesise a type; \item the function's type is not a function type; \item the argument does not check at domain type. \end{itemize} Negative certificates give a precise reason why the algorithm failed. We also designed negative certificates to report as many errors as possible. For example, when a tuple fails to check at a type, we can record all of the fields that fail to check, not only the first. We could enforce that negative certificates record all the errors by requiring a certificate (positive or negative) for every field. Because this guarantee is unnecessary for the rest of the compiler we have not done this. Our more lax certificates are still sufficiently expressive to prove a term cannot have both a positive and negative certificate. \TODO{explain the derivation is runtime erased} Like with the well-scoped AST, we have the choice for type checking to produce an intrisically well-typed AST.\@ We have opted against it because it is harder to express the relation between the input and output terms. Consider the hypothetical signature of the type synthesis function for an intrinsically well-typed AST: \begin{verbatim} synths : Term context -> (env : TyEnv context) -> Maybe (a : Ty ** WfTerm env a) \end{verbatim} The result does not depend on the given input term, so we can give a trivial implementation such as the function constantly returning a well-typed closed term. Compare this to a (simplified) signature for synthesis with an extrinsic certificate: \begin{verbatim} synths : (tm : Term context) -> (env : TyEnv context) -> Maybe (a : Ty ** SynthsTy env tm a) \end{verbatim} The result explicitly depends on the input invalidating the trivial implementation. \TODO{describe why knowing there is a derivation is necessary} We want to emphasise that producing a certificate during type checking does not prove the type checker is correct. The certificate only proves the type checker matches the specification, which may be wrong. As a concrete example, the initial Idris~2 specification for \lang{} certificates had a bug. When checking an abstraction with multiple arguments, the types were assigned to variables in reverse order. For instance when checking \verb|\x, y => x| at type \(A \to B \to A\), the specification assigned \verb|x| type \(B\) and \verb|y| type \(A\). Thus, the certified type checker followed the specification and rejected this term, and even produced a certificate of why it was right to do so. \subsection{Scheme Backend} \TODO{ \begin{itemize} \item justify how dynamic types make Scheme a good compilation target \item describe how products are compiled to assoc-lists \item describe how sums are compiled to tagged values \item describe how dynamic typing removes the need for roll \item explain why fold needs to know the term has a valid type \item explain why fold must be compiled recursively with map \end{itemize} } \end{document}