sec/embedding.ltx


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

\documentclass[../main.tex]{subfiles}

\begin{document}
\section{Embedding in System T}%
\label{sec:embedding}

A major motivation for \lang{} is the ability to write terms with regular types
and embed them in System~T. We explain how the embedding works, which encodes
products, sums and recursive types as System~T types.

Every pair of System~T types has a union (in the sense of C). The union has
injections into it and projections out of it. The specification of the union
type, \(A \sqcup B\) for types is:
\[
  \begin{array}{cc}
    \mathsf{inl} : A \to A \sqcup B &
    \mathsf{inr} : B \to A \sqcup B \\
    \mathsf{prl} : A \sqcup B \to A &
    \mathsf{prr} : A \sqcup B \to B \\
    \mathsf{prl}~(\mathsf{inl}~t) = t &
    \mathsf{prr}~(\mathsf{inr}~t) = t
  \end{array}
\]
\Cref{fig:union-impl} shows \posscite{Squid:online/Kiselyov22} implementation of
unions. We can extend this to n-ary unions \(\bigsqcup_i^n A_i\) in the usual
way. We pick \(\nat\) to represent the empty union arbitrarily.

With unions, it becomes easy to encode products.
\textcite{Squid:online/Kiselyov22} uses the functional encoding \(\prod_i^n A_i
\coloneq \nat \to \bigsqcup_i^n A_i\), representing the product as a
heterogeneous vector. The \(k\)th component of the product is accessed by
applying the product to \(k\). Tupling does a case analysis of the argument,
returning the corresponding value.

\textcite{Squid:online/Kiselyov22} also describes how to encode sums using
products and unions. A sum is encoded as a tagged union \(\sum_i^n A_i \coloneq
\nat \times \bigsqcup_i^n A_i\), where the tag indicates which case of the sum
is being stored. Case analysis first inspects the tag to decide which branch to
take, downcasts the value to the correct type, and then performs the branch
action. Note this encoding is not consistent: the empty type is necessarily
inhabited, as all System~T types are. This inconsistency is necessary for the
encoding of recursive types.

Recursive types are the most difficult to encode.
\textcite{DBLP:books/sp/LongleyN15} describe one strategy: represent values as a
heap. We have the encoding \(\mu X. A \coloneq \nat \times \nat \times (\nat \to
\sub{A}{X/\nat})\). The first component is the ``depth'' of the heap: an upper
bound on the number of pointers to follow on any arbitrary path. The second
component represents the root index of the heap---the address where the data
structure starts. The final component is the heap itself. Each location stores
data associated with its value. Any recursion is replaced with a pointer to a
different location in the heap.

\TODO{example}

Folding is easier to compute than rolling. The precise encoding is given below.
We compute the fold across the heap in parallel, iteratively increasing the
depth of paths we can safely use. The initial result is arbitrary: we don't have
any data to operate on.

\begin{multline*}
\TODO{fold}
\end{multline*}

\end{document}