Explain differences from Neel and Jeremy's parser.

author: Chloe Brown <chloe.brown.00@outlook.com> 2025-06-11 17:14:41 +0100
committer: Chloe Brown <chloe.brown.00@outlook.com> 2025-06-11 17:14:41 +0100
commit: 9300cdaf8a50d0dc55aa4676db401d0efc774007 (patch)
tree: 4af7d625de38789c55431346837bf45b66cde601 /sec/compiler.ltx
parent: c6c015b92a6172b082004cae4a052f1735d83422 (diff)
1 files changed, 46 insertions, 7 deletions
diff --git a/sec/compiler.ltx b/sec/compiler.ltx
index 9be1908..3f0044a 100644
--- a/sec/compiler.ltx
+++ b/sec/compiler.ltx
@@ -27,12 +27,50 @@ three main components:
 We present a simplified form of the Idris types used in the compiler.
 
 \subsection{Parser}
-\TODO{
-  \begin{itemize}
-    \item summarise \textcite{bidir}
-    \item explain dependent state
-  \end{itemize}
-}
+
+The \lang{} parser is derived from \textcite{bidir}. At its core is a
+data type for parser combinators.\@ \textcite{bidir} gave a type system
+for the combinators such that well-formed combinators can parse an
+input: in linear time; without backtracking; and with only a single
+token of lookahead. They implemented both an interpreter and a staged
+compiler for expressions. The interpreter only reports errors about
+the combinators at run time.
+
+We adapted the combinator interpreter for Idris 2. First, we used
+dependant types to statically enforce a combinator is well-formed. The
+``parse'' interpretation function requires a certificate that a parser
+is well-formed. Second, our interpreter produces a certificate that
+the list of unprocessed tokens is a suffix of the input tokens. This
+certificate is required to satisfy the Idris termination checker when
+interpreting fixed point combinators. Finally, we thread state
+throughout the combinators.
+
+It is common for parsers to rely on some additional
+state~\cite{states}. For example \lang{} keeps an environment of bound
+names as state. As Idris has dependant types, we can make the
+result of parsing a combinator depend on the initial state.
+
+A parser combinator can interact with state when:
+\begin{itemize}
+  \item processing a sequence, where it can update the state between
+    items; or
+  \item performing a semantic action (modifying the parse result),
+    where it can read the state.
+\end{itemize}
+For example, consider parsing the syntax \verb|\x => x|, binding the
+variable \verb|x| before referencing it. After processing the first
+occurrence of \verb|x|, the \lang{} parser updates the state adding
+\verb|x| to the environment of bound names. After parsing the second
+\verb|x| as an identifier, the parser can look up the name in the
+environment and convert it into a deBruijn index.
+
+The state is not stored globally, but is local to the current parser
+combinator. We can imagine this as a stack of states. Before parsing
+the first item in a sequence, we push a copy of the current state on
+top of the stack. After parsing that item, we pop the local state off
+the stack. We can then modify the original state using the first
+item's value. We then copy this modified state onto the stack before
+parsing the second item, and so on.
 
 The parser returns an intrinsically well-scoped
 term~\cite{iwss}. Terms are a dependent data type with signature
@@ -45,7 +83,8 @@ would use a separate predicate to prove raw syntax is well-scoped.
 
 \TODO{
   \begin{itemize}
-    \item justify why I used intentionally well-scoped syntax
+    \item why not raw syntax?
+    \item why not extrinsic syntax?
   \end{itemize}
 }
author	Chloe Brown <chloe.brown.00@outlook.com>	2025-06-11 17:14:41 +0100
committer	Chloe Brown <chloe.brown.00@outlook.com>	2025-06-11 17:14:41 +0100
commit	9300cdaf8a50d0dc55aa4676db401d0efc774007 (patch)
tree	4af7d625de38789c55431346837bf45b66cde601 /sec/compiler.ltx
parent	c6c015b92a6172b082004cae4a052f1735d83422 (diff)