diff options
Diffstat (limited to 'sec')
-rw-r--r-- | sec/compiler.ltx | 18 | ||||
-rw-r--r-- | sec/embedding.ltx | 65 | ||||
-rw-r--r-- | sec/intro.ltx | 103 | ||||
-rw-r--r-- | sec/lang.ltx | 226 | ||||
l--------- | sec/lexer.py | 1 | ||||
-rw-r--r-- | sec/reducer.ltx | 17 | ||||
-rw-r--r-- | sec/related.ltx | 12 | ||||
-rw-r--r-- | sec/systemt.ltx | 130 |
8 files changed, 572 insertions, 0 deletions
diff --git a/sec/compiler.ltx b/sec/compiler.ltx new file mode 100644 index 0000000..dde2856 --- /dev/null +++ b/sec/compiler.ltx @@ -0,0 +1,18 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{Idris 2 Compiler}% +\label{sec:compiler} + +\TODO{ + \begin{itemize} + \item QTT basics + \item intrinsically well-scoped AST + \item bidirectional judgement + \item well-typed data types + \item type checking function + \item irrelevant pattern matching + \end{itemize} +} + +\end{document} diff --git a/sec/embedding.ltx b/sec/embedding.ltx new file mode 100644 index 0000000..722eff1 --- /dev/null +++ b/sec/embedding.ltx @@ -0,0 +1,65 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{Embedding in System T}% +\label{sec:embedding} + +A major motivation for \lang{} is the ability to write terms with regular types +and embed them in System~T. We explain how the embedding works, which encodes +products, sums and recursive types as System~T types. + +Every pair of System~T types has a union (in the sense of C). The union has +injections into it and projections out of it. The specification of the union +type, \(A \sqcup B\) for types is: +\[ + \begin{array}{cc} + \mathsf{inl} : A \to A \sqcup B & + \mathsf{inr} : B \to A \sqcup B \\ + \mathsf{prl} : A \sqcup B \to A & + \mathsf{prr} : A \sqcup B \to B \\ + \mathsf{prl}~(\mathsf{inl}~t) = t & + \mathsf{prr}~(\mathsf{inr}~t) = t + \end{array} +\] +\Cref{fig:union-impl} shows \posscite{Squid:online/Kiselyov22} implementation of +unions. We can extend this to n-ary unions \(\bigsqcup_i^n A_i\) in the usual +way. We pick \(\nat\) to represent the empty union arbitrarily. + +With unions, it becomes easy to encode products. +\textcite{Squid:online/Kiselyov22} uses the functional encoding \(\prod_i^n A_i +\coloneq \nat \to \bigsqcup_i^n A_i\), representing the product as a +heterogeneous vector. The \(k\)th component of the product is accessed by +applying the product to \(k\). Tupling does a case analysis of the argument, +returning the corresponding value. + +\textcite{Squid:online/Kiselyov22} also describes how to encode sums using +products and unions. A sum is encoded as a tagged union \(\sum_i^n A_i \coloneq +\nat \times \bigsqcup_i^n A_i\), where the tag indicates which case of the sum +is being stored. Case analysis first inspects the tag to decide which branch to +take, downcasts the value to the correct type, and then performs the branch +action. Note this encoding is not consistent: the empty type is necessarily +inhabited, as all System~T types are. This inconsistency is necessary for the +encoding of recursive types. + +Recursive types are the most difficult to encode. +\textcite{DBLP:books/sp/LongleyN15} describe one strategy: represent values as a +heap. We have the encoding \(\mu X. A \coloneq \nat \times \nat \times (\nat \to +\sub{A}{X/\nat})\). The first component is the ``depth'' of the heap: an upper +bound on the number of pointers to follow on any arbitrary path. The second +component represents the root index of the heap---the address where the data +structure starts. The final component is the heap itself. Each location stores +data associated with its value. Any recursion is replaced with a pointer to a +different location in the heap. + +\TODO{example} + +Folding is easier to compute than rolling. The precise encoding is given below. +We compute the fold across the heap in parallel, iteratively increasing the +depth of paths we can safely use. The initial result is arbitrary: we don't have +any data to operate on. + +\begin{multline*} +\TODO{fold} +\end{multline*} + +\end{document} diff --git a/sec/intro.ltx b/sec/intro.ltx new file mode 100644 index 0000000..00a6d5b --- /dev/null +++ b/sec/intro.ltx @@ -0,0 +1,103 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{Introduction}% +\label{sec:intro} + +System~T is an extension of the simply-typed lambda calculus adding natural +numbers and primitive recursion. This minimality makes it an excellent calculus +to model, and yet the calculus is expressive enough to perform a wide array of +useful calculations. This includes most arithmetic, \TODO{middle thing} and even +rewriting of lambda terms. + +The restricted type system of System~T makes expressing complex data difficult. +Often structured data has to be encoded as naturals or first-order functions. +Encoding data within a natural number often makes structural recursion +exceedingly complex. Whilst using first-order functions can make recursion +easier, it is often the case that passing different arguments can return +different shapes of data. + +Consider the following System~T pseudocode: +\begin{listing}[H] + \begin{systemt} + let sum (tree : nat -> nat -> nat -> nat -> nat) : nat = + let depth = tree 0 0 0 0 in + let root = tree 1 0 0 0 in + let heap = tree 2 in + let go : nat -> nat = primrec depth with + Z => fun i => i + | S(acc) => fun i => + let tag = heap i 0 0 in + let v = heap i 1 in + if tag == 0 then + v 0 + else + acc (v 0) + acc (v 1) + \end{systemt} + \vspace{-\baselineskip} +\end{listing} +From the name and type, one can deduce this performs a sum over some tree +structure. With some more thought, you may be able to work out that +\systemtinline{tree} is a binary tree. This information is not at all available +in its type. It is also unclear why \systemtinline{tree 0 0 0 0} should return +the depth of the tree and \systemtinline{tree 1 0 0 0} the root. + +In other functional programming languages, the same sum function would look +something like the following OCaml code: +\begin{listing}[H] + \begin{minted}{ocaml} + let sum tree = match tree with + Leaf(v) -> v + | Branch(left, right) -> sum left + sum right + \end{minted} + \vspace{-\baselineskip} +\end{listing} +In this example it is immediately obvious what the function does. The code is +first order reducing its complexity and the richer type system makes it easier +to identify the role of different variables. These additional features of OCaml +and other such languages improve the closeness of mapping and +role-expressiveness of the code whilst reducing the error-proneness (\TODO{check + this}). This is great for human programmers but this comes at a modelling +cost. Unlike System~T, most functional languages allow non-termination. For +example, OCaml would have no issue with us writing +\mintinline{ocaml}{sum tree + sum right} by mistake. + +In this work we present \lang{}, a new intermediate language that compiles into +System~T. \lang{} extends the simply-typed lambda calculus with the regular +types~\cite{Squid:unpublished/McBride01}: finite products, finite sums and some +inductive types.\@ \lang{} has most of the expressive types we are used to from +functional languages---we can write code similar to the second example---whilst +still being simple enough to represent within System~T. + +The type encodings used to compile \lang{} into System~T are +well-known~\cites{Squid:online/Kiselyov22}{DBLP:books/sp/LongleyN15}. Encoding +products and sums relies on C-style union types, which exist due to System~T +having only naturals and function types. Recursive types are represented using a +heap with an explicit depth, informally described by +\textcite{DBLP:books/sp/LongleyN15}. We have written this compilation algorithm +within \lang{} itself. + +To aid in the development of this program, we also constructed a type checker +for \lang{} in Idris~2~\cite{idris}. The typechecker uses intrisically +well-scoped syntax. The syntax is represented by an indexed data type +\texttt{Term}, where the index describes the variables that are in scope. By +using quantities~\cite{quants} and auto-implicit search, the compiler code is +written in a named style, yet compiles down to using deBruijn indices. Using +names helps to reduce errors in the code, and the compilation to deBruijn +indices means we don't pay any runtime cost. + +%% \subsubsection*{Contributions}% +%% \label{sec:contributions} +%% \begin{itemize} +%% \item Design of new intermediate language \lang{}, featuring iso-recursive +%% regular types +%% \item Algorithm for embedding \lang{} into to System~T, implementable within +%% \lang{} +%% \item Case study implementation of a fuelled self-reducer for System~T, +%% demonstrating improvements in several cognitive dimensions~\cite{cog-dim} for +%% \lang{} compared to working in System~T directly +%% \item Compiler implemented in Idris 2 with intrinsically well-scoped syntax and +%% runtime-erased typechecking proofs. +%% \end{itemize} + +\end{document} diff --git a/sec/lang.ltx b/sec/lang.ltx new file mode 100644 index 0000000..c7851e2 --- /dev/null +++ b/sec/lang.ltx @@ -0,0 +1,226 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{Core Language}% +\label{sec:lang} + +In this section, we describe \lang{} a new calculus with higher-order functions +and regular types~\cite{Squid:unpublished/McBride01}. We start by describing the +types within the language: products, sums, recursive types and functions. We +then describe the syntax which is standard for n-ary products and sums. We also +define a \mapkw{} operation over arbitrary (covariant) functors.\@ \mapkw{} is +necessary to define the equational theory of \foldkw{}, the eliminator for +recursive types. The equational theory has \(\beta\) equalities, but no \(\eta\) +equalities, a consequence of the embedding into System~T. + +Types for \lang{}, in \cref{fig:lang-wf}, are given relative to a context of +type variables. The judgement \(\jdgmnt{ty}{\Theta}{A}\) states that \(A\) is a +well-formed type in with variables from context \(\Theta\). Products and sums +are well-formed when each of the n-ary components are well-formed. The recursive +type \(\mu X. A\) is formed from a type \(A\) in a context extended by \(X\). +The other type formation rules ensure that \(X\) is used strictly positively, +and thus that recursive types are well-formed. In particular the rule forming +function types forbids using any type variables on the left of the arrow. This +forbids the use of type variables in negative positions. The function type +formation rule also forbids variables on the right. This is to forbid types +such as \(\mu X. 1 + \nat \to X\) of countable trees, as such a type cannot be +represented by System~T~\cite{proof}. Well-formed types also respect +substitution: +\begin{proposition}[Type Substitution] +Given \(\jdgmnt{ty}{\Theta}{A}\) and \(\jdgmnt{ty}{\Psi}{B_i}\) where \(i\) +ranges from \(0\) to \(\lvert \Theta \rvert\), we have +\(\jdgmnt{ty}{\Psi}{\FIXME{\sub{A}{X_i/B_i}}}\). +\end{proposition} + +\begin{figure} +\[ +\begin{array}{ccccc} + \begin{prooftree} + \hypo{ + X \in \Theta + \vphantom{\jdgmnt{ty}{\Theta}{A}} %% Spooky formatting phantom + } + \infer1{\jdgmnt{ty}{\Theta}{X}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\rangeover{\jdgmnt{ty}{\Theta}{A_i}}{i<n}} + \infer1{\jdgmnt{ty}{\Theta}{\prod_i^n A_i}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\rangeover{\jdgmnt{ty}{\Theta}{A_i}}{i<n}} + \infer1{\jdgmnt{ty}{\Theta}{\sum_i^n A_i}} + \end{prooftree} + & + \begin{prooftree} + \hypo{ + \jdgmnt{ty}{\Theta, X}{A} + \vphantom{\rangeover{\jdgmnt{ty}{\Theta}{A}}{i<n}} %% Spooky formatting phantom + } + \infer1{ + \jdgmnt{ty}{\Theta}{\mu X.A} + \vphantom{\jdgmnt{ty}{\Theta}{\sum_i^n A_i}} %% Spooky formatting phantom + } + \end{prooftree} + & + \begin{prooftree} + \hypo{\jdgmnt{ty}{}{A}} + \hypo{\jdgmnt{ty}{}{B}} + \infer2{\jdgmnt{ty}{\Theta}{A \to B}} + \end{prooftree} +\end{array} +\] +\caption{Well-formedness of \lang{} types}% +\label{fig:lang-wf} +\end{figure} + +We present terms in \cref{fig:lang-type}. All type variables must be bound +within recursive types, so there is no type context. The typing rules for +variables, functions, products and sums are standard. Values of a recursive type +\(\mu X. A\) are constructed using \emph{rolling}. A value of type +\(\sub{A}{X/\mu X. A}\) is rolled into one of type \(\mu X. A\). Recursive types +are eliminated by \emph{folding}. To fold a target of type \(\mu X. A\) into +type \(B\), it is necessary to describe how to transform \(\sub{A}{X/B}\) into +\(B\). By only allowing eliminations of this form, we ensure that all recursion +is well-founded. + +\begin{figure} +\[ +\begin{array}{cc} + \begin{prooftree} + \hypo{x : A \in \Gamma} + \infer1{\judgement{\Gamma}{x}{A}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement{\Gamma}{x}{A}} + \infer1{\judgement{\Gamma}{(x : A)}{A}} + \end{prooftree} + \\\\ + \begin{prooftree} + \hypo{\judgement{\Gamma, x : A}{t}{B}} + \infer1{\judgement{\Gamma}{\lambda x.~t}{A \to B}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement{\Gamma}{f}{A \to B}} + \hypo{\judgement{\Gamma}{t}{A}} + \infer2{\judgement{\Gamma}{f~t}{B}} + \end{prooftree} + \\\\ + \begin{prooftree} + \hypo{\rangeover{\judgement{\Gamma}{t_i}{A_i}}{i<n}} + \infer1{\judgement{\Gamma}{\tuple*{\rangeover{t_i}{i<n}}}{\prod_i^n A_i}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement{\Gamma}{e}{\prod_i^n A_i}} + \infer1{\judgement{\Gamma}{e.k}{A_k}} + \end{prooftree} + \\\\ + \begin{prooftree} + \hypo{\judgement{\Gamma}{t}{A_k}} + \infer1{\judgement{\Gamma}{\tuple{k, t}}{\sum_i^n A_i}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement{\Gamma}{e}{\sum_i^n A_i}} + \hypo{\rangeover{\judgement{\Gamma, x_i : A_i}{t_i}{B}}{i<n}} + \infer2{\judgement{\Gamma}{\casetm{e}{x_i}{t_i}{i}{n}}{B}} + \end{prooftree} + \\\\ + \begin{prooftree} + \hypo{\judgement{\Gamma}{t}{\sub{A}{X/\mu X.A}}} + \infer1{\judgement{\Gamma}{\mathsf{roll}~t}{A}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement{\Gamma}{e}{\mu X.A}} + \hypo{\judgement{\Gamma, x : \sub{A}{X/B}}{t}{B}} + \infer2{\judgement{\Gamma}{\foldtm{e}{x}{t}}{B}} + \end{prooftree} +\end{array} +\] +\caption{Typing judgements for \lang{}}% +\label{fig:lang-type} +\end{figure} + +Using these typing rules, we can derive a map operation for any \emph{functor}, +a type with a single free type variable: +\[ +\begin{prooftree} + \hypo{\jdgmnt{ty}{X}{F}} + \hypo{\jdgmnt{ty}{}{B}} + \hypo{\jdgmnt{ty}{}{C}} + \infer3{ + \judgement + {\Gamma} + {\maptm{X}{F}^{A, B}} + {(A \to B) \to \sub{F}{X/A} \to \sub{F}{X/B}}} +\end{prooftree} +\] +The definition of \mapkw, \cref{fig:lang-map}, proceeds by \TODO{induction}. We +will elide the superscripts. The hardest case to understand is recursive types +\(\mu Y. A\). Given a value of type \(\mu Y. \sub{G}{X/B}\), we need to +construct a value of type \(\mu Y. \sub{G}{X/C}\). By folding over the given +value, we need to map \(\sub{G}{X/B, Y/\mu Y. \sub{G}{X/C}}\) into \(\mu Y. +\sub{G}{X/C}\). We can construct such a value by rolling, thus we need to +produce a value of type \(\sub{G}{X/C, Y/\mu Y.\sub{G}{X/C}}\). Note that our +given value from the fold and the result we want to roll both agree on the type +of \(Y\) within \(G\), and only disagree on the type assigned to \(X\). Thus we +use map recursively. + +\begin{figure} +\begin{align*} + \maptm{X}{X}^{A,B}~f~t &= f~t \\ + \maptm{X}{Y}^{A,B}~f~t &= t \qquad\qquad (Y \neq X) \\ + \maptm{X}{A \to B}^{C,D}~f~t &= t \\ + \maptm{X}{\prod_i^n A_i}^{B,C}~f~t &= + \tuple*{\rangeover{\maptm{X}{A_i}^{B,C}~f~t.i}{i<n}} \\ + \maptm{X}{\sum_i^n A_i}^{B,C}~f~t &= + \casetm{t}{x_i}{\tuple{i, \maptm{X}{A_i}^{B,C}~f~x_i}}{i}{n} \\ + \maptm{X}{\mu Y.A}^{B,C}~f~t &= + \foldtm{t}{x}{\mathsf{roll}~(\maptm{X}{\sub{A}{Y/\mu Y.\sub{A}{X/C}}}^{B,C}~f~x)} +\end{align*} +\caption{Definition of \(\mathsf{map}\)}% +\label{fig:lang-map} +\end{figure} + +\Cref{fig:lang-eq} shows the equational theory for \lang{}. We have the standard +\(\beta\) equalities for functions, products and sums. We do not have \(\eta\) +equalities for products or sums as these are no satisfied by the System~T +embedding. We also don't have \(\eta\) equalities for functions as our +equational theory for System~T does not have them either. The \(\beta\) law for +recursive types depends on the map operation. + +\begin{figure} +\begin{align*} + (t : A) &= t \\ + (\lambda x.~t)~u &= \sub{t}{x/u} \\ + \tuple*{\rangeover{t_i}{i<n}}.k &= t_k \\ + \casetm{\tuple{k, t}}{x_i}{u_i}{i}{n} &= \sub{u_k}{x_k/t} \\ + \foldtm{\mathsf{roll}~t}{x}{u} &= + \sub{u}{x/\maptm{X}{A}~(\lambda y.~\foldtm{y}{x}{u})~t} +\end{align*} +\caption{Equational theory for \lang{}}% +\label{fig:lang-eq} +\end{figure} + +\begin{example} + We can use folding and \(\mapkw\) do derive an \emph{unrolling} operation + \begin{gather*} + \mathsf{unroll}_{X. A}~t \coloneq \foldtm{t}{x}{\maptm{X}{A}~\mathsf{roll}~x} \\ + \shortintertext{with typing rule} + \begin{prooftree} + \hypo{\judgement{\Gamma}{t}{\mu X. A}} + \infer1{\judgement{\Gamma}{\mathsf{unroll}~t}{\sub{A}{X/\mu X. A}}} + \end{prooftree} \\ + \shortintertext{From the equational theory, we find} + \mathsf{unroll}~(\mathsf{roll}~t) = + \maptm{X}{A}~\mathsf{roll}~(\maptm{X}{A}~\mathsf{unroll}~t) + \end{gather*} + which in general doesn't simplify further. +\end{example} + +\end{document} diff --git a/sec/lexer.py b/sec/lexer.py new file mode 120000 index 0000000..aef549f --- /dev/null +++ b/sec/lexer.py @@ -0,0 +1 @@ +../lexer.py
\ No newline at end of file diff --git a/sec/reducer.ltx b/sec/reducer.ltx new file mode 100644 index 0000000..8cd19f0 --- /dev/null +++ b/sec/reducer.ltx @@ -0,0 +1,17 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{Fuelled Self-Reducer for System T}% +\label{sec:reducer} + +\TODO{ + \begin{itemize} + \item fuelled self-reducer + \item high-level strategy + \item example: case distinction + \item qualitative comparison + \item quantitative comparison (dLoC) + \end{itemize} +} + +\end{document} diff --git a/sec/related.ltx b/sec/related.ltx new file mode 100644 index 0000000..caeed41 --- /dev/null +++ b/sec/related.ltx @@ -0,0 +1,12 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{Related Work}% +\label{sec:related} + +\TODO{ + \begin{itemize} + \item Hagino 1987 + \end{itemize} +} +\end{document} diff --git a/sec/systemt.ltx b/sec/systemt.ltx new file mode 100644 index 0000000..12b1ed2 --- /dev/null +++ b/sec/systemt.ltx @@ -0,0 +1,130 @@ +\documentclass[../main.tex]{subfiles} + +\begin{document} +\section{System T}% +\label{sec:systemt} + +G\"odel's System~T extends the simply-typed lambda calculus by adding natural +numbers and the primitiver recursor. The type formers are \(\nat\) for natural +numbers and arrows \(A \to B\) for functions. + +\Cref{fig:systemt-type} shows the typing rules for System~T terms. On top of the +usual abstraction and application rules, we can construct unary natural numbers +using the operators \(\zero\) and \(\suc\). Natural numbers are eliminated using +the primitive recursor \primreckw. Given a base case, update function and target +\(n\), the primitive recursor applies the update function to the target \(n\) +times. We will abuse notation and treat the \(\suc\) and \primreckw operators as +functions. + +\begin{figure} + \[ + \begin{array}{ccc} + \begin{prooftree} + \hypo{ + x : A \in \Gamma + \vphantom{\judgement[T]{}{}{}} %% Spooky formatting phantom + } + \infer1{\judgement[T]{\Gamma}{x}{A}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement[T]{\Gamma, x \colon A}{t}{B}} + \infer1{\judgement[T]{\Gamma}{\lambda x.~t}{A \to B}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement[T]{\Gamma}{f}{A \to B}} + \hypo{\judgement[T]{\Gamma}{t}{A}} + \infer2{\judgement[T]{\Gamma}{f~t}{B}} + \end{prooftree} + \\ + \\ + \begin{prooftree} + \hypo{\vphantom{\judgement[T]{}{}{}}} %% Spooky formatting phantom + \infer1{\judgement[T]{\Gamma}{\zero}{\nat}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement[T]{\Gamma}{t}{\nat}} + \infer1{\judgement[T]{\Gamma}{\suc~t}{\nat}} + \end{prooftree} + & + \begin{prooftree} + \hypo{\judgement[T]{\Gamma}{z}{A}} + \hypo{\judgement[T]{\Gamma}{s}{A \to A}} + \hypo{\judgement[T]{\Gamma}{t}{\nat}} + \infer3{\judgement[T]{\Gamma}{\primrec{z}{s}{t}}{A}} + \end{prooftree} + \end{array} + \] + \caption{Typing judgements for System~T}% + \label{fig:systemt-type} +\end{figure} + +The equational theory is defined in \cref{fig:systemt-eq}. We have the standard +\(\beta\)-reduction for functions. When we apply the primitve recursor to target +zero, we get the base case. When the recursor is applied to target \(\suc~t\), +we apply the update function to the result of applying \primreckw{} recursively. +An alternative theory would instead apply the update function directly to the +base case instead of to the overall result of the recursion. Whilst this has +little impact for System~T (and operationally corresponds to tail recursion) the +alternative rule does not generalise to the more complex recursive types found +in \lang{}. + +\begin{figure} + \begin{align*} + (\lambda x.~t)~u &= \sub{t}{x/u} & + \primrec{z}{s}{\zero} &= z & + \primrec{z}{s}{(\suc~t)} &= s~(\primrec{z}{x}{s}{t}) + \end{align*} + \caption{Equational theory for System~T. \TODO{add eta for unions}}% + \label{fig:systemt-eq} +\end{figure} + +The primitive recursor is named as such because it generalises the first-order +primitive recursive functions. For example, we can define addition as +\[ +k + n = \primrec{n}{\suc}{k}. +\] +Despite its name, the primitive recursor can compute more than the primitive +recursive functions. By exploiting the higher-order nature of System~T, one can +define the Ackermann-P\'eter function~\cite{semantic-domain} as +\[ +\lambda k, m.~\primrec{\suc}{(\lambda f.~\primrec{(\lambda n.~n)}{(\lambda g, n.~f~(g~n))})}{k}~m. +\] + +Any countable set can be encoded by natural numbers. For inductively defined +sets, such as lists or abstract syntax trees for programs, one can use a G\"odel +encoding to represent values as natural numbers. This depends on a pairing +function \(\pairingkw : \nat \to \nat \to \nat\) that is a (curried) +bijection \(\nat \times \nat \to +\nat\)~\cites{DiscoverEd:book/oc/Cantor52}{szudzik}. As some examples, we can +encode lists of natural numbers as +\begin{align*} + \encode{\epsilon} &\coloneq \pairing{0}{0} & + \encode{n, l} &\coloneq \pairing{1}{(\pairing{n}{\encode{l}})} \\ + \shortintertext{and System~T terms (using deBruijn indices) as} + \encode{x_i} &\coloneq \pairing{0}{i} & + \encode{\zero} &\coloneq \pairing{3}{0} \\ + \encode{\lambda~t} &\coloneq \pairing{1}{\encode{t}} & + \encode{\suc~t} &\coloneq \pairing{4}{\encode{t}} \\ + \encode{t~u} &\coloneq \pairing{2}{(\pairing{\encode{t}}{\encode{u}})} & + \encode{\primrec{z}{s}{t}} &\coloneq + \pairing{5}{(\pairing{\encode{z}}{(\pairing{\encode{s}}{\encode{t}})})}. +\end{align*} +In general, each constructor has a unique tag and this is paired with an +encoding for each subterm. We can analyse an encoded term by reverting the +pairing function and inspecting the tag. For an invalid tag (for our encoding of +lists this is anything greater than 2) we can return an arbitrary value. +Performing recursion on G\"odel encodings is difficult, particularly for values +with more than one recursive argument. + +An alternative encoding strategy uses functions to encode data. One example we +have already given is our encoding of binary trees, from the introduction. +Another functional encoding is to represent lists as functions \(\nat \to +\nat\). Giving the function \zero{} will return the length of the list, and the +values are 1-indexed. This strategy doesn't require a pairing function, but +values no longer have a unique representation. For example, the functions +\(\lambda n.~n + n\) and \(\lambda n. \zero\) both represent the empty list. + +\end{document} |