From 9300cdaf8a50d0dc55aa4676db401d0efc774007 Mon Sep 17 00:00:00 2001 From: Chloe Brown Date: Wed, 11 Jun 2025 17:14:41 +0100 Subject: Explain differences from Neel and Jeremy's parser. --- sec/compiler.ltx | 53 ++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 46 insertions(+), 7 deletions(-) (limited to 'sec/compiler.ltx') diff --git a/sec/compiler.ltx b/sec/compiler.ltx index 9be1908..3f0044a 100644 --- a/sec/compiler.ltx +++ b/sec/compiler.ltx @@ -27,12 +27,50 @@ three main components: We present a simplified form of the Idris types used in the compiler. \subsection{Parser} -\TODO{ - \begin{itemize} - \item summarise \textcite{bidir} - \item explain dependent state - \end{itemize} -} + +The \lang{} parser is derived from \textcite{bidir}. At its core is a +data type for parser combinators.\@ \textcite{bidir} gave a type system +for the combinators such that well-formed combinators can parse an +input: in linear time; without backtracking; and with only a single +token of lookahead. They implemented both an interpreter and a staged +compiler for expressions. The interpreter only reports errors about +the combinators at run time. + +We adapted the combinator interpreter for Idris 2. First, we used +dependant types to statically enforce a combinator is well-formed. The +``parse'' interpretation function requires a certificate that a parser +is well-formed. Second, our interpreter produces a certificate that +the list of unprocessed tokens is a suffix of the input tokens. This +certificate is required to satisfy the Idris termination checker when +interpreting fixed point combinators. Finally, we thread state +throughout the combinators. + +It is common for parsers to rely on some additional +state~\cite{states}. For example \lang{} keeps an environment of bound +names as state. As Idris has dependant types, we can make the +result of parsing a combinator depend on the initial state. + +A parser combinator can interact with state when: +\begin{itemize} + \item processing a sequence, where it can update the state between + items; or + \item performing a semantic action (modifying the parse result), + where it can read the state. +\end{itemize} +For example, consider parsing the syntax \verb|\x => x|, binding the +variable \verb|x| before referencing it. After processing the first +occurrence of \verb|x|, the \lang{} parser updates the state adding +\verb|x| to the environment of bound names. After parsing the second +\verb|x| as an identifier, the parser can look up the name in the +environment and convert it into a deBruijn index. + +The state is not stored globally, but is local to the current parser +combinator. We can imagine this as a stack of states. Before parsing +the first item in a sequence, we push a copy of the current state on +top of the stack. After parsing that item, we pop the local state off +the stack. We can then modify the original state using the first +item's value. We then copy this modified state onto the stack before +parsing the second item, and so on. The parser returns an intrinsically well-scoped term~\cite{iwss}. Terms are a dependent data type with signature @@ -45,7 +83,8 @@ would use a separate predicate to prove raw syntax is well-scoped. \TODO{ \begin{itemize} - \item justify why I used intentionally well-scoped syntax + \item why not raw syntax? + \item why not extrinsic syntax? \end{itemize} } -- cgit v1.2.3