From 4353f331b5ab4af157f576f54b1cc79dd08abb12 Mon Sep 17 00:00:00 2001 From: Chloe Brown Date: Wed, 23 Apr 2025 15:59:22 +0100 Subject: Current state of affairs. --- sec/compiler.ltx | 25 +++- sec/embedding.ltx | 65 --------- sec/encoding.ltx | 132 +++++++++++++++++++ sec/intro.ltx | 96 +------------- sec/lang.ltx | 388 ++++++++++++++++++++++++++---------------------------- sec/reducer.ltx | 35 ++++- sec/related.ltx | 1 + sec/systemt.ltx | 330 ++++++++++++++++++++++++++++++++++------------ 8 files changed, 617 insertions(+), 455 deletions(-) delete mode 100644 sec/embedding.ltx create mode 100644 sec/encoding.ltx (limited to 'sec') diff --git a/sec/compiler.ltx b/sec/compiler.ltx index dde2856..2cfde8e 100644 --- a/sec/compiler.ltx +++ b/sec/compiler.ltx @@ -6,12 +6,25 @@ \TODO{ \begin{itemize} - \item QTT basics - \item intrinsically well-scoped AST - \item bidirectional judgement - \item well-typed data types - \item type checking function - \item irrelevant pattern matching + \item describe that I have a type checker and compiler implemented in Idris + \item explain what intentionally well-scoped syntax is + \item justify why I used intentionally well-scoped syntax + \item describe how type checking produces a derivation + \item explain the derivation is runtime erased + \item justify the derivation having no runtime cost + \item describe why knowing there is a derivation is necessary + \item explain why these extra features do not guarantee compiler correctness + \end{itemize} +} + +\TODO{ + \begin{itemize} + \item justify how dynamic types make Scheme a good compilation target + \item describe how products are compiled to assoc-lists + \item describe how sums are compiled to tagged values + \item describe how dynamic typing removes the need for roll + \item explain why fold needs to know the term has a valid type + \item explain why fold must be compiled recursively with map \end{itemize} } diff --git a/sec/embedding.ltx b/sec/embedding.ltx deleted file mode 100644 index 722eff1..0000000 --- a/sec/embedding.ltx +++ /dev/null @@ -1,65 +0,0 @@ -\documentclass[../main.tex]{subfiles} - -\begin{document} -\section{Embedding in System T}% -\label{sec:embedding} - -A major motivation for \lang{} is the ability to write terms with regular types -and embed them in System~T. We explain how the embedding works, which encodes -products, sums and recursive types as System~T types. - -Every pair of System~T types has a union (in the sense of C). The union has -injections into it and projections out of it. The specification of the union -type, \(A \sqcup B\) for types is: -\[ - \begin{array}{cc} - \mathsf{inl} : A \to A \sqcup B & - \mathsf{inr} : B \to A \sqcup B \\ - \mathsf{prl} : A \sqcup B \to A & - \mathsf{prr} : A \sqcup B \to B \\ - \mathsf{prl}~(\mathsf{inl}~t) = t & - \mathsf{prr}~(\mathsf{inr}~t) = t - \end{array} -\] -\Cref{fig:union-impl} shows \posscite{Squid:online/Kiselyov22} implementation of -unions. We can extend this to n-ary unions \(\bigsqcup_i^n A_i\) in the usual -way. We pick \(\nat\) to represent the empty union arbitrarily. - -With unions, it becomes easy to encode products. -\textcite{Squid:online/Kiselyov22} uses the functional encoding \(\prod_i^n A_i -\coloneq \nat \to \bigsqcup_i^n A_i\), representing the product as a -heterogeneous vector. The \(k\)th component of the product is accessed by -applying the product to \(k\). Tupling does a case analysis of the argument, -returning the corresponding value. - -\textcite{Squid:online/Kiselyov22} also describes how to encode sums using -products and unions. A sum is encoded as a tagged union \(\sum_i^n A_i \coloneq -\nat \times \bigsqcup_i^n A_i\), where the tag indicates which case of the sum -is being stored. Case analysis first inspects the tag to decide which branch to -take, downcasts the value to the correct type, and then performs the branch -action. Note this encoding is not consistent: the empty type is necessarily -inhabited, as all System~T types are. This inconsistency is necessary for the -encoding of recursive types. - -Recursive types are the most difficult to encode. -\textcite{DBLP:books/sp/LongleyN15} describe one strategy: represent values as a -heap. We have the encoding \(\mu X. A \coloneq \nat \times \nat \times (\nat \to -\sub{A}{X/\nat})\). The first component is the ``depth'' of the heap: an upper -bound on the number of pointers to follow on any arbitrary path. The second -component represents the root index of the heap---the address where the data -structure starts. The final component is the heap itself. Each location stores -data associated with its value. Any recursion is replaced with a pointer to a -different location in the heap. - -\TODO{example} - -Folding is easier to compute than rolling. The precise encoding is given below. -We compute the fold across the heap in parallel, iteratively increasing the -depth of paths we can safely use. The initial result is arbitrary: we don't have -any data to operate on. - -\begin{multline*} -\TODO{fold} -\end{multline*} - -\end{document} diff --git a/sec/encoding.ltx b/sec/encoding.ltx new file mode 100644 index 0000000..84ec4a5 --- /dev/null +++ b/sec/encoding.ltx @@ -0,0 +1,132 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{Encoding Artist}% +\label{sec:embedding} + +\TODO{ + \begin{itemize} + \item remind the reader why encoding into System~T is useful + \end{itemize} +} + +There are seven phases in the encoding process. In general, each phase removes a +specific type constructor until only naturals and function types remain. +Sometimes removing types requires introducing others; we will introduce lists of +naturals and C-style unions, which we will later need to remove. The full list +of seven phases are: +\begin{enumerate} +\item changing the type of the \roll{} operator so that all recursive arguments + are collected together in a list. +\item using a list-indexed heap encoding to represent inductive types. +\item using an eliminator encoding to represent lists. +\item introducing unions to represent sums as a tagged union. +\item encoding products as an indexed union. +\item exploiting argument form of types to represent unions. +\item removing syntactic sugar we introduced, such as the \arb{} operator that + represents an arbitrary value of a given type. +\end{enumerate} + +We will give two running examples throughout, both with regards to the binary +tree type \(\mu X. (\nat \to \nat) + X \times X\), with leaves labelled by +functions natural to natural. In our first example we construct a balanced +binary tree of depth \(n + 1\), with leaves filled by \systemtinline{f}: +\begin{listing}[H] +\begin{systemt} +let balanced n f = primrec n with + Zero => roll (Leaf f) +| Suc tree => roll (Branch (tree, tree)) +\end{systemt} +\vspace{-\baselineskip} +\end{listing} + +Our other example composes the leaves of the tree into a single function, +starting by applying the right-most leaf to the input value: +\begin{listing}[H] +\begin{systemt} +let compose tree = foldmatch tree with + Leaf f => f +| Branch (f, g) => fun x => f (g x) +\end{systemt} +\end{listing} + +\TODO{ + \begin{itemize} + \item state that the initial type of roll makes it hard to encode + \item explain that there could be an arbitrary number of recursive arguments + scattered in a term + \item describe that collecting them into one location makes future encoding + steps easier + \item justify why I add lists as a built-in type former + \item describe the strength function, the key step in this phase + \item state it is defined by induction on the outer functor + \end{itemize} +} + +\TODO{ + \begin{itemize} + \item state that we encode inductive types as nat-list-indexed heaps + \item describe that storing higher-order data makes G\"odel encodings + impractical + \item describe that the need for local encodings rules out Church encodings + \item describe that the need for fold invalidates using codata encodings + \item explain that using nat-list indices reflects the structure of terms + \item give the encoding of roll + \item give the encoding of fold + \item justify the max operator + \item justify the head operator + \item justify the snoc operator + \item justify the arb operator + \end{itemize} +} + +\TODO{ + \begin{itemize} + \item state that we encode lists using their eliminators + \item explain why a list is a pair of length and index function + \item give the encoding of cons + \item give the encoding of snoc + \item give the encoding of max + \item give the encoding of head + \end{itemize} +} + +\TODO{ + \begin{itemize} + \item state that we encode sums as tagged C-style unions + \item explain the operations and equations of unions + \item justify why sums are tagged unions + \item justify adding union types + \item justify the case operator on naturals + \end{itemize} +} + +\TODO{ + \begin{itemize} + \item state that we encode products as functions from naturals to unions + \item explain that products are heterogenous vectors + \item justify implementing products as homogenous functional vectors + \end{itemize} +} + +\TODO{ + \begin{itemize} + \item state that we encode unions as functions with possibly unused + arguments + \item define the union on types in argument form + \item define the put operator + \item define the get operator + \end{itemize} +} + +\TODO{ + \begin{itemize} + \item state that we desugar other operators last + \item define desugaring of arb + \item define desugaring of case + \item define desugaring of map + \item define desugaring of let + \end{itemize} +} + +\end{document} diff --git a/sec/intro.ltx b/sec/intro.ltx index 00a6d5b..0cd7b1d 100644 --- a/sec/intro.ltx +++ b/sec/intro.ltx @@ -4,100 +4,6 @@ \section{Introduction}% \label{sec:intro} -System~T is an extension of the simply-typed lambda calculus adding natural -numbers and primitive recursion. This minimality makes it an excellent calculus -to model, and yet the calculus is expressive enough to perform a wide array of -useful calculations. This includes most arithmetic, \TODO{middle thing} and even -rewriting of lambda terms. - -The restricted type system of System~T makes expressing complex data difficult. -Often structured data has to be encoded as naturals or first-order functions. -Encoding data within a natural number often makes structural recursion -exceedingly complex. Whilst using first-order functions can make recursion -easier, it is often the case that passing different arguments can return -different shapes of data. - -Consider the following System~T pseudocode: -\begin{listing}[H] - \begin{systemt} - let sum (tree : nat -> nat -> nat -> nat -> nat) : nat = - let depth = tree 0 0 0 0 in - let root = tree 1 0 0 0 in - let heap = tree 2 in - let go : nat -> nat = primrec depth with - Z => fun i => i - | S(acc) => fun i => - let tag = heap i 0 0 in - let v = heap i 1 in - if tag == 0 then - v 0 - else - acc (v 0) + acc (v 1) - \end{systemt} - \vspace{-\baselineskip} -\end{listing} -From the name and type, one can deduce this performs a sum over some tree -structure. With some more thought, you may be able to work out that -\systemtinline{tree} is a binary tree. This information is not at all available -in its type. It is also unclear why \systemtinline{tree 0 0 0 0} should return -the depth of the tree and \systemtinline{tree 1 0 0 0} the root. - -In other functional programming languages, the same sum function would look -something like the following OCaml code: -\begin{listing}[H] - \begin{minted}{ocaml} - let sum tree = match tree with - Leaf(v) -> v - | Branch(left, right) -> sum left + sum right - \end{minted} - \vspace{-\baselineskip} -\end{listing} -In this example it is immediately obvious what the function does. The code is -first order reducing its complexity and the richer type system makes it easier -to identify the role of different variables. These additional features of OCaml -and other such languages improve the closeness of mapping and -role-expressiveness of the code whilst reducing the error-proneness (\TODO{check - this}). This is great for human programmers but this comes at a modelling -cost. Unlike System~T, most functional languages allow non-termination. For -example, OCaml would have no issue with us writing -\mintinline{ocaml}{sum tree + sum right} by mistake. - -In this work we present \lang{}, a new intermediate language that compiles into -System~T. \lang{} extends the simply-typed lambda calculus with the regular -types~\cite{Squid:unpublished/McBride01}: finite products, finite sums and some -inductive types.\@ \lang{} has most of the expressive types we are used to from -functional languages---we can write code similar to the second example---whilst -still being simple enough to represent within System~T. - -The type encodings used to compile \lang{} into System~T are -well-known~\cites{Squid:online/Kiselyov22}{DBLP:books/sp/LongleyN15}. Encoding -products and sums relies on C-style union types, which exist due to System~T -having only naturals and function types. Recursive types are represented using a -heap with an explicit depth, informally described by -\textcite{DBLP:books/sp/LongleyN15}. We have written this compilation algorithm -within \lang{} itself. - -To aid in the development of this program, we also constructed a type checker -for \lang{} in Idris~2~\cite{idris}. The typechecker uses intrisically -well-scoped syntax. The syntax is represented by an indexed data type -\texttt{Term}, where the index describes the variables that are in scope. By -using quantities~\cite{quants} and auto-implicit search, the compiler code is -written in a named style, yet compiles down to using deBruijn indices. Using -names helps to reduce errors in the code, and the compilation to deBruijn -indices means we don't pay any runtime cost. - -%% \subsubsection*{Contributions}% -%% \label{sec:contributions} -%% \begin{itemize} -%% \item Design of new intermediate language \lang{}, featuring iso-recursive -%% regular types -%% \item Algorithm for embedding \lang{} into to System~T, implementable within -%% \lang{} -%% \item Case study implementation of a fuelled self-reducer for System~T, -%% demonstrating improvements in several cognitive dimensions~\cite{cog-dim} for -%% \lang{} compared to working in System~T directly -%% \item Compiler implemented in Idris 2 with intrinsically well-scoped syntax and -%% runtime-erased typechecking proofs. -%% \end{itemize} +\TODO{} \end{document} diff --git a/sec/lang.ltx b/sec/lang.ltx index c7851e2..3b0134a 100644 --- a/sec/lang.ltx +++ b/sec/lang.ltx @@ -4,223 +4,207 @@ \section{Core Language}% \label{sec:lang} -In this section, we describe \lang{} a new calculus with higher-order functions -and regular types~\cite{Squid:unpublished/McBride01}. We start by describing the -types within the language: products, sums, recursive types and functions. We -then describe the syntax which is standard for n-ary products and sums. We also -define a \mapkw{} operation over arbitrary (covariant) functors.\@ \mapkw{} is -necessary to define the equational theory of \foldkw{}, the eliminator for -recursive types. The equational theory has \(\beta\) equalities, but no \(\eta\) -equalities, a consequence of the embedding into System~T. - -Types for \lang{}, in \cref{fig:lang-wf}, are given relative to a context of -type variables. The judgement \(\jdgmnt{ty}{\Theta}{A}\) states that \(A\) is a -well-formed type in with variables from context \(\Theta\). Products and sums -are well-formed when each of the n-ary components are well-formed. The recursive -type \(\mu X. A\) is formed from a type \(A\) in a context extended by \(X\). -The other type formation rules ensure that \(X\) is used strictly positively, -and thus that recursive types are well-formed. In particular the rule forming -function types forbids using any type variables on the left of the arrow. This -forbids the use of type variables in negative positions. The function type -formation rule also forbids variables on the right. This is to forbid types -such as \(\mu X. 1 + \nat \to X\) of countable trees, as such a type cannot be -represented by System~T~\cite{proof}. Well-formed types also respect -substitution: -\begin{proposition}[Type Substitution] -Given \(\jdgmnt{ty}{\Theta}{A}\) and \(\jdgmnt{ty}{\Psi}{B_i}\) where \(i\) -ranges from \(0\) to \(\lvert \Theta \rvert\), we have -\(\jdgmnt{ty}{\Psi}{\FIXME{\sub{A}{X_i/B_i}}}\). -\end{proposition} +\TODO{ + \begin{itemize} + \item introduce core language as STLC with regular types + \end{itemize} +} \begin{figure} -\[ -\begin{array}{ccccc} - \begin{prooftree} - \hypo{ - X \in \Theta - \vphantom{\jdgmnt{ty}{\Theta}{A}} %% Spooky formatting phantom - } - \infer1{\jdgmnt{ty}{\Theta}{X}} - \end{prooftree} - & - \begin{prooftree} - \hypo{\rangeover{\jdgmnt{ty}{\Theta}{A_i}}{i fun t => match t with + Leaf n => k == n + | Branch _ _ => false + | Branch eql eqr => fun t => match t with + Leaf _ => false + | Branch l r => eql l && eqr r + in + go tree2 + \end{systemt} + \caption{Pseudocode for determining equality of two binary trees.}\label{lst:tree-eq} +\end{listing} + +In the remainder of the section we outline four different strategies for +encoding inductive types within System~T. We judge each encoding on whether it +is a local, simply-typed transformation as well as how easy it is to hand write +the three fundamental operators. As you can see from the summary in +\cref{tbl:encodings}, each strategy is lacking in some regard. + +\begin{table} + \caption{Comparison of encoding strategies for inductive types within + System~T. The first column assesses if the strategy is a local, simply-typed + transformation. The remainder indicate whether it is ``easy'' to hand write + the constructors, fold and pattern matching + respectively.}\label{tbl:encodings} + \begin{tabular}{lcccc} + \toprule + Strategy & Local & Constructors & Fold & Pattern Match \\ + \midrule + G\"odel & \ding{51} & \ding{51} & \ding{55} & \ding{51} \\ + Church & \ding{55} & \ding{51} & \ding{51} & \ding{55} \\ + Eliminator & \ding{51} & \ding{51} & \ding{55} & \ding{51} \\ + Heap & \ding{51} & \ding{55} & \ding{51} & \ding{51} \\ + \bottomrule + \end{tabular} +\end{table} + +\subsubsection*{G\"odel Encodings} + +G\"odel encodings represent inductive data types as natural numbers. It is +well-known that there are pairing functions, bijections from \(\mathbb{N} \times +\mathbb{N}\) to \(\mathbb{N}\), that are primitive recursive. We can use these +functions to encode inductive data types as pairs of tags and values. Writing +\(\tuple{\cdot, \cdot}\) for the pairing function, we can encode the binary +tree constructors as +\begin{align*} + \mathsf{leaf}(n) &\coloneq \tuple{0, n} \\ + \mathsf{branch}(l, r) &\coloneq \tuple{1, \tuple{l, r}}. +\end{align*} +Pattern matching on G\"odel encodings is achieved by using the unpairing part of +the bijection to determine the specific constructor and its payload. + +One major limitation of G\"odel encodings is that they cannot represent +higher-order data such as functions. The defining trait of G\"odel encodings is +that they represent data by natural numbers, and as function types cannot be +encoded as naturals, we cannot encode functions. We also cannot use a syntactic +representation of functions; \textcite{Squid:unpublished/Bauer17} proves there +is no System~T term that can recover a function from its syntax. + +Another difficulty with G\"odel encodings is implementing the fold operation. +We require our choice of pairing function to be monotonic so that an encoded +value is ``large enough'' to iterate over. The fold with motive type \(A\) then +proceeds with a primitive recursion over the successor of the natural +representing the target with motive type \(\nat \to A\). The intent is that the +\((n+1)\)-th stage of recursion will correctly fold a target with encoding +\(n\). There are no expectations for the base case for the recursion, so we can +use an arbitrary value. For the successor case, we use pattern matching to +deconstruct the input encoded value into its constructor and payload. We can use +the inductive argument to fold over recursive values in the payload, and as the +pairing function is monotonic, these values must be smaller than the current +case, so the resulting value is correct. We then apply the appropriate case for +the fold. Below is the fold over binary trees. \[ -\lambda k, m.~\primrec{\suc}{(\lambda f.~\primrec{(\lambda n.~n)}{(\lambda g, n.~f~(g~n))})}{k}~m. + \dofoldmatch*[t]{t}{ + \mathsf{leaf}(n). f(n); + \mathsf{branch}(x, y). g(x, y) + } \coloneq + \dolet{ + go = \doprimrec*{t}{\arb}{h}{ + \lambda i. \domatch*{i}{ + \mathsf{leaf}(n). f(n); + \mathsf{branch}(l, r). g(h(l), h(r))}} + }*{go~t} \] -Any countable set can be encoded by natural numbers. For inductively defined -sets, such as lists or abstract syntax trees for programs, one can use a G\"odel -encoding to represent values as natural numbers. This depends on a pairing -function \(\pairingkw : \nat \to \nat \to \nat\) that is a (curried) -bijection \(\nat \times \nat \to -\nat\)~\cites{DiscoverEd:book/oc/Cantor52}{szudzik}. As some examples, we can -encode lists of natural numbers as +\subsubsection*{Church Encodings} + +Typically used in untyped settings, Church encodings represent data by their +fold operation. For example binary trees are Church encoded by the type \(\forall A. +(\nat \to A) \to (A \to A \to A) \to A\). The first argument describes how to deconstruct +leaves, and the second applies the inductive step on branches. This encoding +makes implementing fold trivial. Constructors for the inductive types are also +simple to encode, as demonstrated by the following example: +\begin{align*} + \mathsf{leaf}(n) &\coloneq \lambda l, b.~l~n + & + \mathsf{branch}(x, y) &\coloneq \lambda l, b.~b~(x~l~b)~(y~l~b) +\end{align*} + +Whilst Church encodings are great for representing folds and constructors, +pattern matching becomes more challenging. The most general strategy is to use +the fold to recover a sum over the possible constructors, and then eliminate +this using the clauses of the pattern match. For example, we would fold a binary +tree into the sum \(\nat + T \times T\), representing leaves and branches +respectively, and then case split on this sum. We detail this in \cref{fig:church-pm} + +\begin{figure} + \[ + \domatch*[t]{t}{ + \mathsf{leaf}(n). f(n); + \mathsf{branch}(x,y). g(x, y) + } \coloneq + \dolet[t]{ + \roll = \lambda x. \domatch*[t]{x}{ + \mathsf{Left}(n). \mathsf{leaf}(n); + \mathsf{Right}(x, y). \mathsf{branch}(x, y) + } + }{ + x = \dofoldmatch*{t}{ + \mathsf{leaf}(n). \mathsf{Left}(n); + \mathsf{branch}(x, y). \mathsf{Right}(\roll~x, \roll~y) + } + }*{ + \domatch*{x}{ + \mathsf{Left}(n). f(n); + \mathsf{Right}(x, y). g(x, y) + } + } + \] + \caption{How to perform pattern matching on binary trees for Church encodings. + We assume we have an encoding for sums with pattern matching.}\label{fig:church-pm} +\end{figure} + +You may have noticed that this encoding included a type quantification---in +general, one must use polymorphism to fully represent Church encodings. To avoid +polymorphism requires a global analysis, replacing the polymorphic bound with a +(hopefully) finite product of possible consumers. + +\subsubsection*{Eliminator Encodings} + +This encoding strategy generalises Church encodings. Rather than encoding +inductive types by their folds, eliminator encodings represent inductive types +as a product of all their (contravariant) consumers. For example, if we know the +only operations we will perform on a binary tree are finding the sum of all +leaves and the depth of the tree. We can encode the binary tree as a pair \(\nat +\times \nat\), where the first value is the sum and the second is the depth. + +In this encoding, the constructors eagerly compute the eliminated values. With +our sum and depth example, we encode the constructors by \begin{align*} - \encode{\epsilon} &\coloneq \pairing{0}{0} & - \encode{n, l} &\coloneq \pairing{1}{(\pairing{n}{\encode{l}})} \\ - \shortintertext{and System~T terms (using deBruijn indices) as} - \encode{x_i} &\coloneq \pairing{0}{i} & - \encode{\zero} &\coloneq \pairing{3}{0} \\ - \encode{\lambda~t} &\coloneq \pairing{1}{\encode{t}} & - \encode{\suc~t} &\coloneq \pairing{4}{\encode{t}} \\ - \encode{t~u} &\coloneq \pairing{2}{(\pairing{\encode{t}}{\encode{u}})} & - \encode{\primrec{z}{s}{t}} &\coloneq - \pairing{5}{(\pairing{\encode{z}}{(\pairing{\encode{s}}{\encode{t}})})}. + \mathsf{leaf}(n) &\coloneq \tuple{n, 1} + & + \mathsf{branch}(x, y) &\coloneq \tuple{x.0 + y.0, \suc~(x.1 \sqcup y.1)} \end{align*} -In general, each constructor has a unique tag and this is paired with an -encoding for each subterm. We can analyse an encoded term by reverting the -pairing function and inspecting the tag. For an invalid tag (for our encoding of -lists this is anything greater than 2) we can return an arbitrary value. -Performing recursion on G\"odel encodings is difficult, particularly for values -with more than one recursive argument. - -An alternative encoding strategy uses functions to encode data. One example we -have already given is our encoding of binary trees, from the introduction. -Another functional encoding is to represent lists as functions \(\nat \to -\nat\). Giving the function \zero{} will return the length of the list, and the -values are 1-indexed. This strategy doesn't require a pairing function, but -values no longer have a unique representation. For example, the functions -\(\lambda n.~n + n\) and \(\lambda n. \zero\) both represent the empty list. +using \(k \sqcup n\) to compute the maximum of two naturals. We can easily encode the +specified eliminators too; simply take the corresponding projection from the +product. + +Another benefit of eliminator encodings are that the constructors are +extensible. As no information about exactly which constructor was used remains +in the encoded type, we are free to add some interesting constructors. For +example, we can add the constructor \(\mathsf{double}\) that doubles the value +of each leaf: +\[ +\mathsf{double}(x) \coloneq \tuple{2 \cdot x.0, x.1} +\] + +Eliminator encodings inherit the weaknesses of Church encodings. Defining an +arbitrary fold requires polymorphism. Pattern matching is also impossible in the +general case. With our sum and depth example, we have no way of knowing whether +the value representing a tree came from a leaf or branch as we deliberately +forget this information. + +\subsubsection*{Heap Encodings} + +This encoding strategy is a crude approximation of a heap of memory. Given an +indexing type \(I\), an inductive type is a triple of a \emph{recursive depth}, +a \emph{root index}, and an \(I\)-indexed \emph{heap} of values. Instead of +storing recursive values ``in-line'', one stores an index to its place in the heap. + +Consider the binary tree \(\mathsf{branch}(\mathsf{branch}(\mathsf{leaf}(1), +\mathsf{leaf}(2)), \mathsf{branch}(\mathsf{leaf}(3), +\mathsf{branch}(\mathsf{leaf}(4), \mathsf{leaf}(5))))\), depicted +diagramatically in \cref{fig:box-pointer:tree}. The recursive depth of the tree +is three, as there are at most three nested constructors (e.g.\ the path from +root to the value four). + + +%% This encoding strategy is a crude approximation of a heap of memory. +%% Implementing this strategy requires choosing a (first-order) indexing type. Due +%% to G\"odel encodings, \nat{} is sufficient. In \cref{subsec:encode-inductive} we +%% use lists of naturals to better reflect the tree-like structure of inductive +%% values. + +\TODO{ + \begin{itemize} + \item describe with a box-and-pointer diagram + \item explain briefly how to fold and roll + \end{itemize} +} \end{document} -- cgit v1.2.3