sec/intro.ltx


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103

\documentclass[../main.tex]{subfiles}

\begin{document}
\section{Introduction}%
\label{sec:intro}

System~T is an extension of the simply-typed lambda calculus adding natural
numbers and primitive recursion. This minimality makes it an excellent calculus
to model, and yet the calculus is expressive enough to perform a wide array of
useful calculations. This includes most arithmetic, \TODO{middle thing} and even
rewriting of lambda terms.

The restricted type system of System~T makes expressing complex data difficult.
Often structured data has to be encoded as naturals or first-order functions.
Encoding data within a natural number often makes structural recursion
exceedingly complex. Whilst using first-order functions can make recursion
easier, it is often the case that passing different arguments can return
different shapes of data.

Consider the following System~T pseudocode:
\begin{listing}[H]
  \begin{systemt}
  let sum (tree : nat -> nat -> nat -> nat -> nat) : nat =
    let depth = tree 0 0 0 0 in
    let root  = tree 1 0 0 0 in
    let heap  = tree 2 in
    let go : nat -> nat = primrec depth with
      Z      => fun i => i
    | S(acc) => fun i =>
      let tag = heap i 0 0 in
      let v   = heap i 1 in
      if tag == 0 then
        v 0
      else
        acc (v 0) + acc (v 1)
  \end{systemt}
  \vspace{-\baselineskip}
\end{listing}
From the name and type, one can deduce this performs a sum over some tree
structure. With some more thought, you may be able to work out that
\systemtinline{tree} is a binary tree. This information is not at all available
in its type. It is also unclear why \systemtinline{tree 0 0 0 0} should return
the depth of the tree and \systemtinline{tree 1 0 0 0} the root.

In other functional programming languages, the same sum function would look
something like the following OCaml code:
\begin{listing}[H]
  \begin{minted}{ocaml}
  let sum tree = match tree with
    Leaf(v)             -> v
  | Branch(left, right) -> sum left + sum right
  \end{minted}
  \vspace{-\baselineskip}
\end{listing}
In this example it is immediately obvious what the function does. The code is
first order reducing its complexity and the richer type system makes it easier
to identify the role of different variables. These additional features of OCaml
and other such languages improve the closeness of mapping and
role-expressiveness of the code whilst reducing the error-proneness (\TODO{check
  this}). This is great for human programmers but this comes at a modelling
cost. Unlike System~T, most functional languages allow non-termination. For
example, OCaml would have no issue with us writing
\mintinline{ocaml}{sum tree + sum right} by mistake.

In this work we present \lang{}, a new intermediate language that compiles into
System~T. \lang{} extends the simply-typed lambda calculus with the regular
types~\cite{Squid:unpublished/McBride01}: finite products, finite sums and some
inductive types.\@ \lang{} has most of the expressive types we are used to from
functional languages---we can write code similar to the second example---whilst
still being simple enough to represent within System~T.

The type encodings used to compile \lang{} into System~T are
well-known~\cites{Squid:online/Kiselyov22}{DBLP:books/sp/LongleyN15}. Encoding
products and sums relies on C-style union types, which exist due to System~T
having only naturals and function types. Recursive types are represented using a
heap with an explicit depth, informally described by
\textcite{DBLP:books/sp/LongleyN15}. We have written this compilation algorithm
within \lang{} itself.

To aid in the development of this program, we also constructed a type checker
for \lang{} in Idris~2~\cite{idris}. The typechecker uses intrisically
well-scoped syntax. The syntax is represented by an indexed data type
\texttt{Term}, where the index describes the variables that are in scope. By
using quantities~\cite{quants} and auto-implicit search, the compiler code is
written in a named style, yet compiles down to using deBruijn indices. Using
names helps to reduce errors in the code, and the compilation to deBruijn
indices means we don't pay any runtime cost.

%% \subsubsection*{Contributions}%
%% \label{sec:contributions}
%% \begin{itemize}
%% \item Design of new intermediate language \lang{}, featuring iso-recursive
%%   regular types
%% \item Algorithm for embedding \lang{} into to System~T, implementable within
%%   \lang{}
%% \item Case study implementation of a fuelled self-reducer for System~T,
%%   demonstrating improvements in several cognitive dimensions~\cite{cog-dim} for
%%   \lang{} compared to working in System~T directly
%% \item Compiler implemented in Idris 2 with intrinsically well-scoped syntax and
%%   runtime-erased typechecking proofs.
%% \end{itemize}

\end{document}