\documentclass[../main.tex]{subfiles} \begin{document} \section{Encoding Artist}% \label{sec:embedding} \TODO{ \begin{itemize} \item remind the reader why encoding into System~T is useful \end{itemize} } There are seven phases in the encoding process. In general, each phase removes a specific type constructor until only naturals and function types remain. Sometimes removing types requires introducing others; we will introduce lists of naturals and C-style unions, which we will later need to remove. The full list of seven phases are: \begin{enumerate} \item changing the type of the \roll{} operator so that all recursive arguments are collected together in a list. \item using a list-indexed heap encoding to represent inductive types. \item using an eliminator encoding to represent lists. \item introducing unions to represent sums as a tagged union. \item encoding products as an indexed union. \item exploiting argument form of types to represent unions. \item removing syntactic sugar we introduced, such as the \arb{} operator that represents an arbitrary value of a given type. \end{enumerate} We will give two running examples throughout, both with regards to the binary tree type \(\mu X. (\nat \to \nat) + X \times X\), with leaves labelled by functions natural to natural. In our first example we construct a balanced binary tree of depth \(n + 1\), with leaves filled by \systemtinline{f}: \begin{listing}[H] \begin{systemt} let balanced n f = primrec n with Zero => roll (Leaf f) | Suc tree => roll (Branch (tree, tree)) \end{systemt} \vspace{-\baselineskip} \end{listing} Our other example composes the leaves of the tree into a single function, starting by applying the right-most leaf to the input value: \begin{listing}[H] \begin{systemt} let compose tree = foldmatch tree with Leaf f => f | Branch (f, g) => fun x => f (g x) \end{systemt} \end{listing} \subsection{Phase 1: Simplifying Roll}% \label{subsec:simplify-roll} Recall the typing judgement for \roll{} in \cref{fig:lang-ty}. The premise has type \(\sub{A}{X/\mu X. A}\). One consequence of the use of substitution is that inductive values can appear scattered throughout a term of this type. Take the inductive type \(\mu X. (1 + \nat \times X + \mu Y. 1 + X \times Y) \times (1 + X)\). A term of this type can have any number of inductive values, located in distant parts of the term. Collecting all the inductive values into one location will make future encoding steps much easier. We enforce this by removing the \roll{} operator and adding the \roll*{} operator, which has the following type derivation: \[ \begin{prooftree} \hypo{\judgement{\Gamma}{t}{\mathsf{List}~(\mu X.A)}} \hypo{\judgement{\Gamma}{u}{\sub{A}{X/\nat}}} \infer2{\judgement{\Gamma}{\roll*~t~u}{\mu X. A}} \end{prooftree} \] Rather than include the inductive values within the term to roll, they are instead gathered into an external list. The places that contained inductive values in the rolled term now contain indices into the list. The new operator satisfies the following equation: \[ \dofold{\roll*~t~u}{x}{v} \coloneq \sub{v}{x/\mapkw{}~(\lambda i. \dofold{\mathsf{index}~t~i}{x}{v})~u} \] \TODO{justify why I add lists as a built-in type former} To encode \roll{} into \roll*{} we require a function that traverses a term of type \(\sub{A}{X/\mu X. A}\) and collects all inductive values into a single list. We can extend a list with a single value and return the index of that value with the writer monad~\cite{writer}: \(\mathsf{extend} : A \to \mathsf{List}~A \to \mathsf{List}~A \times \nat\). By using the \mapkw{} operator we can replace all inductive values in a term \(\sub{A}{X/\mu X. A}\) with accumulator functions \(\sub{A}{X/\mathsf{List}~(\mu X. A) \to \mathsf{List}~(\mu X. A) \times \nat}\). The non-trivial step is ``distributing'' the writer monad with the substitution to obtain a value of type \(\mathsf{List}~(\mu X. A) \to \mathsf{List}~(\mu X. A) \times \sub{A}{X/\nat}\). We can apply this function to the empty list to obtain the arguments for \roll*{}. Given a well-formedness derivation \(\jdgmnt{ty}{\Psi}{A}\), a type variable \(X \in \Psi\), a type environment \(\alpha\) and a type \(S\), we have a term \(\mathsf{distrib}\) defined in phase-one \lang{} of type \[ \submult{A}{\sub{\alpha}{X/S \to S \times \alpha(X)}} \to S \to S \times \submult{A}{\alpha} \] that calls each accumulator within \(A\) in sequence. The definition is by induction on the well-formedness derivation. \subsection{Phase 2: Encoding Inductive Types}% \label{subsec:inductive-types} We use a modified heap encoding to encode regular types. We use a \(\mathsf{List}~\nat\)-indexed heap, but keep the pointers within terms as naturals. The idea is that the heap index describes the path taken through the term to reach a particular point, whilst the pointers describe the next step along the path. \TODO{ \begin{itemize} \item describe that storing higher-order data makes G\"odel encodings impractical \item describe that the need for local encodings rules out Church encodings \item describe that the need for fold invalidates using codata encodings \item explain that using nat-list indices reflects the structure of terms \item give the encoding of roll \item give the encoding of fold \item justify the max operator \item justify the head operator \item justify the snoc operator \item justify the arb operator \end{itemize} } \TODO{ \begin{itemize} \item state that we encode lists using their eliminators \item explain why a list is a pair of length and index function \item give the encoding of cons \item give the encoding of snoc \item give the encoding of max \item give the encoding of head \end{itemize} } \TODO{ \begin{itemize} \item state that we encode sums as tagged C-style unions \item explain the operations and equations of unions \item justify why sums are tagged unions \item justify adding union types \item justify the case operator on naturals \end{itemize} } \TODO{ \begin{itemize} \item state that we encode products as functions from naturals to unions \item explain that products are heterogenous vectors \item justify implementing products as homogenous functional vectors \end{itemize} } \TODO{ \begin{itemize} \item state that we encode unions as functions with possibly unused arguments \item define the union on types in argument form \item define the put operator \item define the get operator \end{itemize} } \TODO{ \begin{itemize} \item state that we desugar other operators last \item define desugaring of arb \item define desugaring of case \item define desugaring of map \item define desugaring of let \end{itemize} } \end{document}