diff options
author | Chloe Brown <chloe.brown.00@outlook.com> | 2025-03-25 16:52:43 +0000 |
---|---|---|
committer | Chloe Brown <chloe.brown.00@outlook.com> | 2025-03-25 16:52:43 +0000 |
commit | a2afd4b08dc2b7eada2f95ee95457457a3331344 (patch) | |
tree | 671b8530d7e8934efad4d91f7575ae01833c6bfe /sec/intro.ltx |
Before the big rewrite
Diffstat (limited to 'sec/intro.ltx')
-rw-r--r-- | sec/intro.ltx | 103 |
1 files changed, 103 insertions, 0 deletions
diff --git a/sec/intro.ltx b/sec/intro.ltx new file mode 100644 index 0000000..00a6d5b --- /dev/null +++ b/sec/intro.ltx @@ -0,0 +1,103 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{Introduction}% +\label{sec:intro} + +System~T is an extension of the simply-typed lambda calculus adding natural +numbers and primitive recursion. This minimality makes it an excellent calculus +to model, and yet the calculus is expressive enough to perform a wide array of +useful calculations. This includes most arithmetic, \TODO{middle thing} and even +rewriting of lambda terms. + +The restricted type system of System~T makes expressing complex data difficult. +Often structured data has to be encoded as naturals or first-order functions. +Encoding data within a natural number often makes structural recursion +exceedingly complex. Whilst using first-order functions can make recursion +easier, it is often the case that passing different arguments can return +different shapes of data. + +Consider the following System~T pseudocode: +\begin{listing}[H] + \begin{systemt} + let sum (tree : nat -> nat -> nat -> nat -> nat) : nat = + let depth = tree 0 0 0 0 in + let root = tree 1 0 0 0 in + let heap = tree 2 in + let go : nat -> nat = primrec depth with + Z => fun i => i + | S(acc) => fun i => + let tag = heap i 0 0 in + let v = heap i 1 in + if tag == 0 then + v 0 + else + acc (v 0) + acc (v 1) + \end{systemt} + \vspace{-\baselineskip} +\end{listing} +From the name and type, one can deduce this performs a sum over some tree +structure. With some more thought, you may be able to work out that +\systemtinline{tree} is a binary tree. This information is not at all available +in its type. It is also unclear why \systemtinline{tree 0 0 0 0} should return +the depth of the tree and \systemtinline{tree 1 0 0 0} the root. + +In other functional programming languages, the same sum function would look +something like the following OCaml code: +\begin{listing}[H] + \begin{minted}{ocaml} + let sum tree = match tree with + Leaf(v) -> v + | Branch(left, right) -> sum left + sum right + \end{minted} + \vspace{-\baselineskip} +\end{listing} +In this example it is immediately obvious what the function does. The code is +first order reducing its complexity and the richer type system makes it easier +to identify the role of different variables. These additional features of OCaml +and other such languages improve the closeness of mapping and +role-expressiveness of the code whilst reducing the error-proneness (\TODO{check + this}). This is great for human programmers but this comes at a modelling +cost. Unlike System~T, most functional languages allow non-termination. For +example, OCaml would have no issue with us writing +\mintinline{ocaml}{sum tree + sum right} by mistake. + +In this work we present \lang{}, a new intermediate language that compiles into +System~T. \lang{} extends the simply-typed lambda calculus with the regular +types~\cite{Squid:unpublished/McBride01}: finite products, finite sums and some +inductive types.\@ \lang{} has most of the expressive types we are used to from +functional languages---we can write code similar to the second example---whilst +still being simple enough to represent within System~T. + +The type encodings used to compile \lang{} into System~T are +well-known~\cites{Squid:online/Kiselyov22}{DBLP:books/sp/LongleyN15}. Encoding +products and sums relies on C-style union types, which exist due to System~T +having only naturals and function types. Recursive types are represented using a +heap with an explicit depth, informally described by +\textcite{DBLP:books/sp/LongleyN15}. We have written this compilation algorithm +within \lang{} itself. + +To aid in the development of this program, we also constructed a type checker +for \lang{} in Idris~2~\cite{idris}. The typechecker uses intrisically +well-scoped syntax. The syntax is represented by an indexed data type +\texttt{Term}, where the index describes the variables that are in scope. By +using quantities~\cite{quants} and auto-implicit search, the compiler code is +written in a named style, yet compiles down to using deBruijn indices. Using +names helps to reduce errors in the code, and the compilation to deBruijn +indices means we don't pay any runtime cost. + +%% \subsubsection*{Contributions}% +%% \label{sec:contributions} +%% \begin{itemize} +%% \item Design of new intermediate language \lang{}, featuring iso-recursive +%% regular types +%% \item Algorithm for embedding \lang{} into to System~T, implementable within +%% \lang{} +%% \item Case study implementation of a fuelled self-reducer for System~T, +%% demonstrating improvements in several cognitive dimensions~\cite{cog-dim} for +%% \lang{} compared to working in System~T directly +%% \item Compiler implemented in Idris 2 with intrinsically well-scoped syntax and +%% runtime-erased typechecking proofs. +%% \end{itemize} + +\end{document} |