sec/encoding.ltx


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278

\documentclass[../main.tex]{subfiles}

\begin{document}
\section{Encoding Artist}%
\label{sec:embedding}

\TODO{
  \begin{itemize}
    \item remind the reader why encoding into System~T is useful
  \end{itemize}
}

There are seven phases in the encoding process. In general, each phase removes a
specific type constructor until only naturals and function types remain.
Sometimes removing types requires introducing others; we will introduce lists of
naturals and C-style unions, which we will later need to remove. The full list
of seven phases are:
\begin{enumerate}
\item changing the type of the \roll{} operator so that all recursive arguments
  are collected together in a list.
\item using a list-indexed heap encoding to represent inductive types.
\item using an eliminator encoding to represent lists.
\item introducing unions to represent sums as a tagged union.
\item encoding products as an indexed union.
\item exploiting argument form of types to represent unions.
\item removing syntactic sugar we introduced, such as the \arb{} operator that
  represents an arbitrary value of a given type.
\end{enumerate}

We will give two running examples throughout, both with regards to the binary
tree type \(\mu X. (\nat \to \nat) + X \times X\), with leaves labelled by
functions natural to natural. In our first example we construct a balanced
binary tree of depth \(n + 1\), with leaves filled by \systemtinline{f}:
\begin{listing}[H]
\begin{systemt}
let balanced n f = primrec n with
  Zero     => roll (Leaf f)
| Suc tree => roll (Branch (tree, tree))
\end{systemt}
\vspace{-\baselineskip}
\end{listing}

Our other example composes the leaves of the tree into a single function,
starting by applying the right-most leaf to the input value:
\begin{listing}[H]
\begin{systemt}
let compose tree = foldmatch tree with
  Leaf f        => f
| Branch (f, g) => fun x => f (g x)
\end{systemt}
\end{listing}

\subsection{Phase 1: Simplifying Roll}%
\label{subsec:simplify-roll}

Recall the typing judgement for \roll{} in \cref{fig:lang-ty}. The premise has
type \(\sub{A}{X/\mu X. A}\). One consequence of the use of substitution is that
inductive values can appear scattered throughout a term of this type. Take the
inductive type \(\mu X. (1 + \nat \times X + \mu Y. 1 + X \times Y) \times (1 + X)\). A term of
this type can have any number of inductive values, located in distant parts of
the term.

Collecting all the inductive values into one location will make future encoding
steps much easier. We enforce this by removing the \roll{} operator and adding
the \roll*{} operator, which has the following type derivation:
\[
\begin{prooftree}
  \hypo{\judgement{\Gamma}{t}{\mathsf{List}~(\mu X.A)}}
  \hypo{\judgement{\Gamma}{u}{\sub{A}{X/\nat}}}
  \infer2{\judgement{\Gamma}{\roll*~t~u}{\mu X. A}}
\end{prooftree}
\]
Rather than include the inductive values within the term to roll, they are
instead gathered into an external list. The places that contained inductive
values in the rolled term now contain indices into the list. The new operator
satisfies the following equation:
\[
\dofold{\roll*~t~u}{x}{v} \coloneq \sub{v}{x/\mapkw{}~(\lambda i. \dofold{\mathsf{index}~t~i}{x}{v})~u}
\]

\TODO{justify why I add lists as a built-in type former}

To encode \roll{} into \roll*{} we require a function that traverses a term of
type \(\sub{A}{X/\mu X. A}\) and collects all inductive values into a single list.
We can extend a list with a single value and return the index of that value with
the writer monad~\cite{writer}: \(\mathsf{extend} : A \to \mathsf{List}~A \to
\mathsf{List}~A \times \nat\). By using the \mapkw{} operator we can replace all
inductive values in a term \(\sub{A}{X/\mu X. A}\) with accumulator functions
\(\sub{A}{X/\mathsf{List}~(\mu X. A) \to \mathsf{List}~(\mu X. A) \times \nat}\). The
non-trivial step is ``distributing'' the writer monad with the substitution to
obtain a value of type \(\mathsf{List}~(\mu X. A) \to \mathsf{List}~(\mu X. A) \times
\sub{A}{X/\nat}\). We can apply this function to the empty list to obtain the
arguments for \roll*{}.

Given a well-formedness derivation \(\jdgmnt{ty}{\Psi}{A}\), a type variable \(X \in
\Psi\), a type environment \(\alpha\) and a type \(S\), we have a term
\(\mathsf{distrib}\) defined in phase-one \lang{} of type
\[
\submult{A}{\sub{\alpha}{X/S \to S \times \alpha(X)}} \to S \to S \times \submult{A}{\alpha}
\]
that calls each accumulator within \(A\) in sequence. The definition is by
induction on the well-formedness derivation. At the end of this phase, the
\systemtinline{compose} example is unchanged. The \systemtinline{balanced}
example reduces to:
\begin{listing}[H]
\begin{systemt}
let balanced n f = primrec n with
  Zero     => roll2 []           (Leaf f)
| Suc tree => roll2 [tree, tree] (Branch (0, 1))
\end{systemt}
\vspace{-\baselineskip}
\end{listing}

\subsection{Phase 2: Encoding Inductive Types}%
\label{subsec:inductive-types}

We use a modified heap encoding to encode regular types. We use a
\(\mathsf{List}~\nat\)-indexed heap, but keep the pointers within terms as
naturals. The idea is that the heap index describes the path taken through the
term to reach a particular point, whilst the pointers describe the next step
along the path.

We choose to use a heap encoding over another encoding strategy for the
following reasons. Firstly, inductive types in \lang{} can contain higher-order
data, such as our tree of functions, which prevents us from using G\"odel
encodings. Using a local translation makes writing the encoding easier, and as
System~T does not have polymorphism, we cannot use Church encodings. We need to
be able to write the fold operation, so we cannot use eliminator encodings. Thus
the only suitable encoding strategy is a heap encoding.

Unlike the description of the heap encoding in
\cref{M-subsec:heap-encoding} we do not use the same type for indices
and pointers. We use \(\mathsf{List}~\nat\) as the index type,
representing a path through the term. We use the empty list to
indicate the root of the inductive value. Otherwise, the head of the
list selects which child to recurse into and the tail the path with
this root. Instead of eagerly computing paths within the heap, we
compute new paths lazily. The only necessary value to store is the
index of the given child.

\begin{figure}
  \begin{align*}
    \roll*~ts~x &\coloneq \tuple*{
      1 + \mathsf{max}~(\mapkw~(\lambda t. t.0)~ts),
      \lambda i. \domatch*{i}{
        \mathsf{nil}. x;
        \mathsf{cons}(i, j). {(\mathsf{index}~ts~i).1~j}}}
    \\
    \dofold{t}{x}{u} &\coloneq \dolet
      {go}*{\doprimrec*{t.0}
        {\arb}
        {r}{\lambda i. \sub{u}{x/\mapkw~(\lambda n. r~(\mathsf{snoc}~i~n))~(t.1~i)}}
      }*{go~\mathsf{nil}}
  \end{align*}
  \caption{Phase 2 encoding of the \roll*{} and \foldkw{} operators.}\label{fig:phase-2-encode}
\end{figure}

More formally, we encode the type \(\mu X. A\) as \(\nat \times
(\mathsf{List}~\nat \to \sub{A}{X/\nat})\), recursively encoding \(A\).
We present the encoding of \roll*{} and \foldkw{} in
\cref{fig:phase-2-encode}. We add three new operators for working with
lists:
\begin{description}
  \item[\(\mathsf{max}\)] for calculating the maximum from a list of
    naturals;
  \item[\(\mathsf{snoc}\)] for appending a single item to the end of a
    list;
  \item[\(\mathsf{match}\)] for pattern matching on a list.
\end{description}
Computing the maximum value from a list is necessary to correctly
determine the recursive depth to use when folding over an inductive
value. It is also the primary reason why infinite inductive types are
forbidden. Take for example the inductive type \(\mu X. 1 + (\nat \to
X)\) of countable trees. To compute the recursive depth, we need to
compute the maximum of a countable sequence, which is impossible in
general. Thus we cannot encode such infinite types.

Adding the \(\mathsf{snoc}\) operator may at first seem
counterproductive; we want to encode away inductive types and
recursion, yet \(\mathsf{snoc}\) is naively a recursion over an
inductive type. Fortunately there exist encodings for lists such that
not only does \(\mathsf{snoc}\) avoid recursion, but it is also as
performant as cons.

\TODO{
  \begin{itemize}
    \item justify the head operator
    \item justify the arb operator
  \end{itemize}
}

We now return to our examples. After some beta reduction we recover
the following value for \systemtinline{balanced}:
\begin{listing}[H]
\begin{systemt}
let balanced n f = primrec n with
  Zero              => (1, fun xs =>
    match xs with
      []      => Leaf f
    | x :: xs => match x with
      _ => arb)
| Suc (depth, heap) => (1 + max [depth, depth], fun xs =>
    match xs with
      []      => Branch (0, 1)
    | x :: xs => match x with
        0 => heap xs
      | 1 => heap xs
      | _ => arb)
\end{systemt}
\vspace{-\baselineskip}
\end{listing}

And here is the updated value of \systemtinline{compose}:
\begin{listing}[H]
\begin{systemt}
let compose (depth, heap) =
  let go = primrec depth with
    Zero   => arb
  | Suc ih => fun index =>
    let x = map (fun x => ih (snoc xs x)) (heap index) in
    match x with
      Leaf f        => f
    | Branch (f, g) => fun x => f (g x)
  in go []
\end{systemt}
\vspace{-\baselineskip}
\end{listing}

\TODO{
  \begin{itemize}
    \item state that we encode lists using their eliminators
    \item explain why a list is a pair of length and index function
    \item give the encoding of cons
    \item give the encoding of snoc
    \item give the encoding of max
    \item give the encoding of head
  \end{itemize}
}

\TODO{
  \begin{itemize}
    \item state that we encode sums as tagged C-style unions
    \item explain the operations and equations of unions
    \item justify why sums are tagged unions
    \item justify adding union types
    \item justify the case operator on naturals
  \end{itemize}
}

\TODO{
  \begin{itemize}
    \item state that we encode products as functions from naturals to unions
    \item explain that products are heterogenous vectors
    \item justify implementing products as homogenous functional vectors
  \end{itemize}
}

\TODO{
  \begin{itemize}
    \item state that we encode unions as functions with possibly unused
      arguments
    \item define the union on types in argument form
    \item define the put operator
    \item define the get operator
  \end{itemize}
}

\TODO{
  \begin{itemize}
    \item state that we desugar other operators last
    \item define desugaring of arb
    \item define desugaring of case
    \item define desugaring of map
    \item define desugaring of let
  \end{itemize}
}

\end{document}