\documentclass[../main.tex]{subfiles}

\begin{document}
\section{System T}%
\label{sec:systemt}

System~T is a simply-typed lambda calculus. The types are naturals
\nat{} and functions \(A \to B\). On top of the function abstraction and
application operators, we have \zero{} and \suc{} for zero and
successor, as well as \primreckw{}, the primitive
recursor. \Cref{fig:syst-typing} shows the typing judgements, where
\(\judgement{\Gamma}[T]{t}{A}\) means that \(t\) is System~T term of type
\(A\) in context \(\Gamma\), and \cref{fig:syst-eq} the equational theory.

\begin{figure}
  \[
  \begin{array}{ccc}
    \begin{prooftree}
      \hypo{x : A \in \Gamma}
      \infer1{\judgement{\Gamma}[T]{x}{A}}
    \end{prooftree}
    &
    \begin{prooftree}
      \hypo{\judgement{\Gamma, x : A}[T]{t}{B}}
      \infer1{\judgement{\Gamma}[T]{\lambda x. t}{A \to B}}
    \end{prooftree}
    &
    \begin{prooftree}
      \hypo{\judgement{\Gamma}[T]{f}{A \to B}}
      \hypo{\judgement{\Gamma}[T]{t}{A}}
      \infer2{\judgement{\Gamma}[T]{f~t}{B}}
    \end{prooftree}
    \\\\
    \begin{prooftree}
      \infer0{\judgement{\Gamma}[T]{\zero}{\nat}}
    \end{prooftree}
    &
    \begin{prooftree}
      \hypo{\judgement{\Gamma}[T]{t}{\nat}}
      \infer1{\judgement{\Gamma}[T]{\suc~t}{\nat}}
    \end{prooftree}
    &
    \begin{prooftree}
      \hypo{\judgement{\Gamma}[T]{t}{\nat}}
      \hypo{\judgement{\Gamma}[T]{u}{A}}
      \hypo{\judgement{\Gamma, x : A}[T]{v}{A}}
      \infer3{\judgement{\Gamma}[T]{\primrec{t}{u}{(x.v)}}{A}}
    \end{prooftree}
  \end{array}
  \]
  \caption{Typing judgements for System~T}\label{fig:syst-typing}
\end{figure}

\begin{figure}
  \begin{align*}
    (\lambda x. t)~u &= \sub{t}{x/u} \\
    \primrec{\zero}{u}{(x.v)} &= u \\
    \primrec{(\suc~t)}{u}{(x.v)} &= \sub{v}{x/\primrec{t}{u}{(x.v)}}
  \end{align*}
  \caption{Equational theory for System~T}\label{fig:syst-eq}
\end{figure}

Types in System~T are usually presented as binary trees, with branches
representing arrows and leaves being \(\nat\). An alternative
presentation is \emph{argument
form}~\cite{DBLP:books/sp/LongleyN15}. Notice that all types can be
written in the form \(A_1 \to \cdots \to A_n \to \nat\) for some unique list of
argument types \(A_i\). We can therefore represent a type by its list
of arguments. For example, the type \(\nat\) has argument form \(\epsilon\)
(the empty list), and the type \(\nat \to \nat \to (\nat \to \nat) \to \nat\)
has argument form \([\epsilon, \epsilon, [\epsilon]]\).

All System~T types are inhabited.\@ \nat{} is inhabited by zero, and
any function type is inhabited when its codomain is inhabited by
taking the constant function.

\subsection{Strategies for Encoding Inductive Types}%
\label{subsec:encoding-strategies}

Inductive types are a useful programming abstraction that is missing from
System~T. If we are to effectively encode an inductive type, there are three
operations we would like to use:
\begin{enumerate}
  \item constructors to form values of an inductive type;
  \item folding to fully consume an inductive value;
  \item pattern matching to support iterating on multiple values simultaneously
\end{enumerate}
As a concrete example, consider the pseudocode in \cref{lst:tree-eq}
for comparing whether two binary trees are equal. We use a fold to
create an equality predicate out of \systemtinline{tree1}. When
\systemtinline{tree1} is a leaf, we pattern match on the other tree to
see if it is also a leaf. If \systemtinline{tree1} is instead a
branch, we obtain equality predicates for its two subtrees by the
induction principal. We again pattern match on the other tree and test
equality recursively when it is also a branch.

\begin{listing}
  \begin{systemt}
  let equal tree1 tree2 =
    let go = fold tree1 with
      Leaf k => fun t => match t with
        Leaf n     => k == n
      | Branch _ _ => false
    | Branch eql eqr => fun t => match t with
        Leaf _     => false
      | Branch l r => eql l && eqr r
    in
    go tree2
  \end{systemt}
  \caption{Pseudocode for determining equality of two binary trees.}\label{lst:tree-eq}
\end{listing}

In the remainder of the section we outline four different strategies
for encoding inductive types within System~T. We judge each encoding
on: if it can encode inductive types with higher-order data such as
functions; if it is a local, simply-typed transformation that does not
require polymorphism or a global program analysis; how easy it is to
hand write the three fundamental operators. As you can see from the
summary in \cref{tbl:encodings}, each strategy is lacking in some
regard.

\begin{table}
  \caption{Comparison of encoding strategies for inductive types within
    System~T. The first two columns assesses whether a strategy works for
    higher-order types and is a local, simply-typed transformation. The
    remainder indicate whether it is ``easy'' (\ding{51}) or possible
    (\ding{81}) to hand write the constructors, fold and pattern matching
    respectively.}\label{tbl:encodings}
  \begin{tabular}{lccccc}
    \toprule
    Strategy & Higher Order & Local & Constructors & Fold & Pattern Match \\
    \midrule
    G\"odel & \ding{55} & \ding{51} & \ding{51} & \ding{81} & \ding{51} \\
    Church & \ding{51} & \ding{55} & \ding{51} & \ding{51} & \ding{81} \\
    Eliminator & \ding{51} & \ding{51} & \ding{51} & \ding{55} & \ding{51} \\
    Heap & \ding{51} & \ding{51} & \ding{81} & \ding{81} & \ding{51} \\
    \bottomrule
  \end{tabular}
\end{table}

\subsubsection*{G\"odel Encodings}

G\"odel encodings represent inductive data types as natural
numbers. It is well-known that there are pairing functions, bijections
from \(\mathbb{N} \times \mathbb{N}\) to \(\mathbb{N}\), that are primitive recursive. We can use
these functions to encode inductive types as pairs of tags and
payloads, such that every constructor has a unique tag. Writing
\(\tuple{\cdot, \cdot}\) for a chosen pairing function, we can encode the
binary tree constructors as
\begin{align*}
  \mathsf{leaf}(n) &\coloneq \tuple{0, n} \\
  \mathsf{branch}(l, r) &\coloneq \tuple{1, \tuple{l, r}}.
\end{align*}
One can pattern match on G\"odel encodings by using the unpairing part of
the bijection to split a value back into the tag and payload.

One major limitation of G\"odel encodings is that they cannot
represent higher-order data such as functions. The defining trait of
G\"odel encodings is that they represent data by natural numbers. As
function types cannot be encoded as naturals, we cannot encode
functions. We also cannot use a syntactic representation of functions;
\textcite{Squid:unpublished/Bauer17} proves there is no System~T term
that can recover a function from its syntax.

Another difficulty with G\"odel encodings is implementing the fold
operation.  We require our chosen pairing function to be monotonic so
that an inductive value is ``large enough'' to iterate over.  To fold
a target into the motive type \(A\), we use primitive recursion to
construct a function from encodings into \(A\).  The intent is that
the \((n+1)\)-th stage of recursion will correctly fold a target with
encoding \(n\). There are no expectations for the base case for the
recursion, so we can use an arbitrary value. For the successor case,
we use pattern matching to deconstruct the function input into its tag
and payload. We can use the induction hypothesis to fold over
inductive components in the payload. Using the induction hypothesis
gives a correct value, as a monotonic pairing function ensures each
inductive component is smaller than the function input we are
currently working on. We then apply the appropriate branch for the
constructor we have. Below is the fold over binary trees.
\[
  \dofoldmatch*[t]{t}{
    \mathsf{leaf}(n). f(n);
    \mathsf{branch}(x, y). g(x, y)
  } \coloneq
  \dolet{go}*{
    \doprimrec*{t}{\arb}{h}{
      \lambda i. \domatch*{i}{
        \mathsf{leaf}(n). f(n);
        \mathsf{branch}(l, r). g(h(l), h(r))}}
  }*{go~t}
\]

\subsubsection*{Church Encodings}

Typically used in untyped settings, Church encodings represent data by their
fold operation. For example binary trees are Church encoded by the type \(\forall A.
(\nat \to A) \to (A \to A \to A) \to A\). The first argument describes how to deconstruct
leaves, and the second applies the inductive step on branches. This encoding
makes implementing fold trivial. Constructors for the inductive types are also
simple to encode, as demonstrated by the following example:
\begin{align*}
  \mathsf{leaf}(n) &\coloneq \lambda l, b.~l~n
  &
  \mathsf{branch}(x, y) &\coloneq \lambda l, b.~b~(x~l~b)~(y~l~b)
\end{align*}

Whilst Church encodings are great for representing folds and
constructors, pattern matching becomes more challenging. The most
general strategy is to use the fold to recover a sum over the possible
constructors, and then eliminate this using the branches of the
pattern match. For example, we would fold a binary tree into the sum
\(\nat + T \times T\), representing leaves and branches respectively, and
then pattern match on this sum. We detail this in \cref{fig:church-pm}

\begin{figure}
  \[
    \domatch*[t]{t}{
      \mathsf{leaf}(n).     f(n);
      \mathsf{branch}(x,y). g(x, y)
    } \coloneq
    \dolet[t]{\roll}*{
      \lambda x. \domatch*[t]{x}{
        \mathsf{Left}(n).     \mathsf{leaf}(n);
        \mathsf{Right}(x, y). \mathsf{branch}(x, y)}
    }{x}*{
      \dofoldmatch*{t}{
        \mathsf{leaf}(n).      \mathsf{Left}(n);
        \mathsf{branch}(x, y). \mathsf{Right}(\roll~x, \roll~y)}
    }*{
      \domatch*{x}{
        \mathsf{Left}(n).     f(n);
        \mathsf{Right}(x, y). g(x, y)}
    }
  \]
  \caption{How to perform pattern matching on binary trees for Church encodings.
    We assume we have an encoding for sums with pattern matching.}\label{fig:church-pm}
\end{figure}

You may have noticed that this encoding included a type quantification---in
general, one must use polymorphism to fully represent Church encodings. To avoid
polymorphism requires a global analysis, replacing the polymorphic bound with a
(hopefully) finite product of possible consumers.

\subsubsection*{Eliminator Encodings}

This encoding strategy generalises Church encodings. Rather than encoding
inductive types by their folds, eliminator encodings represent inductive types
as a product of all their (contravariant) consumers. For example, if we know the
only operations we will perform on a binary tree are finding the sum of all
leaves and the depth of the tree. We can encode the binary tree as a pair \(\nat
\times \nat\), where the first value is the sum and the second is the depth.

In this encoding, the constructors eagerly compute the results of
eliminators. With our sum and depth example, we encode the
constructors by
\begin{align*}
  \mathsf{leaf}(n) &\coloneq \tuple{n, 1}
  &
  \mathsf{branch}(x, y) &\coloneq \tuple{x.0 + y.0, \suc~(x.1 \sqcup y.1)}
\end{align*}
using \(k \sqcup n\) to compute the maximum of two naturals. We can easily encode the
specified eliminators too; simply take the corresponding projection from the
product.

Another benefit of eliminator encodings are that the constructors are
extensible. As no information about exactly which constructor was used remains
in the encoded type, we are free to add some interesting constructors. For
example, we can add the constructor \(\mathsf{double}\) that doubles the value
of each leaf:
\[
\mathsf{double}(x) \coloneq \tuple{2 \cdot x.0, x.1}
\]

Eliminator encodings inherit the weaknesses of Church encodings. Defining an
arbitrary fold requires polymorphism. Pattern matching is also impossible in the
general case. With our sum and depth example, we have no way of knowing whether
the value representing a tree came from a leaf or branch as we deliberately
forget this information.

\subsubsection*{Heap Encodings}

This encoding strategy is a crude approximation of a heap of memory. Given an
indexing type \(I\), an inductive type is a triple of a \emph{recursive depth},
a \emph{root index}, and an \(I\)-indexed \emph{heap} of values. Instead of
storing recursive values ``in-line'', one stores an index to its place in the heap.

Consider the binary tree
\(\mathsf{branch}(\mathsf{branch}(\mathsf{leaf}(1), \mathsf{leaf}(2)),
\mathsf{branch}(\mathsf{leaf}(3), \mathsf{branch}(\mathsf{leaf}(4),
\mathsf{leaf}(5))))\), depicted diagramatically in
\cref{fig:box-pointer:tree}. The recursive depth of the tree is four,
as there are at most four nested constructors (e.g.\ the path from
root to the value five). \Cref{fig:box-pointer:heap} gives a
representation of the heap used to store this tree. Each constructor
has its own place in the heap. Leaves store the value of the leaf;
branches store the indices of the two inductive components. Like
Church encodings, this encoding relies on having an encoding for sums
in order to store the data for different constructors within a single
type.

\TODO{
  \begin{itemize}
    \item create box-and-pointer diagram
  \end{itemize}
}

The recursive depth is essential for the fold operation. We
iteratively construct a heap of values; the recursive depth is an
upper bound on the number of steps it takes to converge on the
``true'' result of the fold. For recursive depth zero, we return an
arbitrary value at each index. As no inductive value is made from
applying zero constructors, we have no soundness obligations. For the
successor case, at any given index we look up the constructor and
payload in the heap. We use the induction hypothesis to replace
inductive components with their final values, and then apply the
appropriate fold branch.

\[
\dofoldmatch*{t}{
  \mathsf{leaf}(n).      f(n);
  \mathsf{branch}(x, y). g(x, y)
} \coloneq
\dolet
  {\tuple{depth, root, heap}}{t}
  {go}*{
    \doprimrec*{depth}
      {\arb{}}
      {h}{\lambda i.
        \domatch*{heap~i}{
          \mathsf{Left}(n).     f(n);
          \mathsf{Right}(j, k). g(h~i, h~j)}}}
  *{go~root}
\]

Roll is harder to encode as it involves merging several heaps into
one. The basic idea is that the indices for the new heap have to
contain both which inductive component to recurse in to and the index
within that component's heap. Pointers in payloads in the component's
heap are then ``fixed up'' as appropriate. One also needs a special
index to refer to the payload of the new root.

\end{document}