From 06b77e9c299489bff5140f092eb985613bcce35b Mon Sep 17 00:00:00 2001 From: Greg Brown Date: Fri, 20 May 2022 12:40:19 +0100 Subject: Add implementation version 1.5. --- thesis.org | 899 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 895 insertions(+), 4 deletions(-) (limited to 'thesis.org') diff --git a/thesis.org b/thesis.org index 8efbda1..b328789 100644 --- a/thesis.org +++ b/thesis.org @@ -634,14 +634,901 @@ execution. This makes the semantics much simpler to reason about. ** ? * Design of AMPSL and its Semantics - -# This chapter may be called something else... but in general the -# idea is that you have one (or a few) ``meat'' chapters which describe -# the work you did in technical detail. +In this chapter I introduce AMPSL, a language mocking the Arm pseudocode. AMPSL +is defined within Agda, and makes judicious use of Agda's dependent-typing +features to eliminate assertions and ensure programs cannot fail. + +To construct proofs about how AMPSL behaves, it is necessary to describe its +semantics. This is done through providing a denotational semantics. Denotational +semantics interpret program expressions and statements as mathematical +functions, something which Agda is well-suited to do. + +One downside of denotational semantics is that control flow for looping +constructs is fully evaluated. This is inefficient for loops that undergo many +iterations. This can be resolved by a syntax-directed Hoare logic for AMPSL. +Hoare logic derivations assign a precondition and a postcondition assertion to +each statement. These are chained together though a number of simple logical +implications. ** AMPSL Syntax +AMPSL is a language similar to the Armv8-M pseudocode specification language +written entirely in Agda. Unfortunately, the pseudocode has a number of small +features that make it difficult to work with in Agda directly. AMPSL makes a +number of small changes to the pseudocode to better facilitate this embedding, +typically generalising existing features of the pseudocode. + +*** AMPSL Types + +#+name: AMPSL-types +#+caption: The Agda datatype representing the types present in AMPSL. Most have +#+caption: a direct analogue in the Armv8-M pseudocode specification language +#+attr_latex: :float t +#+begin_src agda2 +data Type : Set where + bool : Type + int : Type + fin : (n : ℕ) → Type + real : Type + tuple : Vec Type n → Type + array : Type → (n : ℕ) → Type +#+end_src + +[[AMPSL-types]] gives the Agda datatype representing the types of AMPSL. Most of +these have a direct analogue to the pseudocode types. For example, ~bool~ is a +Boolean, ~int~ mathematical integers, ~real~ is for mathematical real numbers +and ~array~ constructs array types. Instead of an enumeration construct, AMPSL +uses the ~fin n~ type, representing a finite set of ~n~ elements. Similarly, +structs are represented by ~tuple~ types. + +The most significant difference between the pseudocode and AMPSL is the +representation of bitstrings. Whilst the pseudocode has the ~bits~ datatype, +AMPSL instead treats bitstrings as an array of Booleans. This removes the +distinction between arrays and bitstrings, and allows a number of operations to +be generalised to work on both types. This makes AMPSL more expressive than the +pseudocode, in the sense that there are a greater number and more concise ways +to write expressions that are functionally equivalent. + +The pseudocode implicitly specifies three different properties of types: equality +comparisons, order comparisons and arithmetic operations. Whilst the types +satisfying these properties need to be listed explicitly in Agda, using instance +arguments allows for these proofs to be elided whenever they are required. + +AMPSL has only two differences in types that satisfy these properties compared +to the pseudocode. First, all array types have equality as long as the +enumerated type also has equality. This is a natural generalisation of the +equality between types, and allows for the AMPSL formulation of bitstrings as +arrays of Booleans to have equality. Secondly, finite sets also have ordering. +This change is primarily a convenience feature for comparing finite representing +a subset of integers. As the pseudocode has no ordering comparisons between +enumerations, this causes no problems for converting pseudocode programs into +AMPSL. + +The final interesting feature of the types in AMPSL is implicit coercion for +arithmetic. As pseudocode arithmetic is polymorphic for integers and reals, +AMPSL needs a function to decide the type of the result. By describing the +output type as a function on the input types, the same constructor can be used +for all combinations of numeric inputs. + +*** AMPSL Expressions + +#+name: AMPSL-literalType +#+caption: Mappings from AMPSL types into Agda types which can be used as +#+caption: literal values. ~literalTypes~ is a function that returns a product +#+caption: of the types given in the argument. +#+begin_src agda +literalType : Type → Set +literalType bool = Bool +literalType int = ℤ +literalType (fin n) = Fin n +literalType real = ℤ +literalType (tuple ts) = literalTypes ts +literalType (array t n) = Vec (literalType t) n +#+end_src + +Unlike the pseudocode, where only a few types have literal expressions, every +type in AMPSL has a literal form. This mapping is part of the ~literalType~ +function, given in [[AMPSL-literalType]]. Most AMPSL literals accept the +corresponding Agda type as a value. For instance, ~bool~ literals are Agda +Booleans, and ~array~ literals are fixed-length Agda vectors of the +corresponding underlying type. The only exception to this rule is for ~real~ +values. As Agda does not have a type representing mathematical reals, integers +are used instead. This is sufficient as any real value occurring in the +pseudocode in [cite:@arm/DDI0553B.s] is rational. + +# TODO: why is this sufficient? + +#+name: AMPSL-expr-prototypes +#+caption: Prototypes of the numerous AMPSL program elements. Each one takes two +#+caption: variable contexts: ~Σ~ for global variables and ~Γ~ for local variables. +#+attr_latex: :float t +#+begin_src agda +data Expression (Σ : Vec Type o) (Γ : Vec Type n) : Type → Set +data Reference (Σ : Vec Type o) (Γ : Vec Type n) : Type → Set +data LocalReference (Σ : Vec Type o) (Γ : Vec Type n) : Type → Set +data Statement (Σ : Vec Type o) (Γ : Vec Type n) : Set +data LocalStatement (Σ : Vec Type o) (Γ : Vec Type n) : Set +data Function (Σ : Vec Type o) (Γ : Vec Type n) (ret : Type) : Set +data Procedure (Σ : Vec Type o) (Γ : Vec Type n) : Set +#+end_src + +[[AMPSL-expr-prototypes]] lists the prototypes for the various AMPSL program +elements, with the full definitions being given in [[*AMPSL Syntax Definition]]. +Each of the AMPSL program element types are parameterised by two variable +contexts: Σ for global variables and Γ for local variables. The two variable +contexts are split to simplify the types for function calls and procedure +invocations. As the set of global variables does not change across a program, +functions and procedures keep the same value of parameter Σ in their types. As +functions and procedures have different local variables than the calling +context, having the local variable context as a separate parameter makes the +change simple. + +An ~Expression~ in AMPSL corresponds with expressions in the pseudocode. Many +operators are identical to those in the pseudocode (like ~+~, ~*~, ~-~), and +others are simple renamings (like ~≟~ instead of ~==~ for equality comparisons). +Unlike the pseudocode, where literals can appear unqualified, AMPSL literals +are introduced by the ~lit~ constructor. + +The most immediate change for programming in AMPSL versus the pseudocode is how +variables are handled. Because the ~Expression~ type carries fixed-length +vectors listing the AMPSL types of variables, a variable is referred to by its +index into the context. For example, a variable context \(\{x \mapsto +\mathrm{int}, y \mapsto \mathrm{real}\}\) is represented in AMPSL as the context +~int ∷ real ∷ []~. The variable \(x\) is then represented by ~var 0F~ in AMPSL. +Because the global and local variable contexts are disjoint for the ~Expression~ +type, variables are constructed using ~state~ or ~var~ respectively. + +Whilst this decision introduces much complexity to programming using AMPSL, it +greatly simplifies the language for use in constructing proofs. It is also a +technique used in the internal representation of many compilers (FIXME: cite). + +AMPSL expressions also add a number of useful constructs to the pseudocode type. +One such pair is ~[_]~ and ~unbox~, which construct and destruct an array of +length one respectively. Others are ~fin~, which allows for arbitrary +computations on elements of finite sets, and ~asInt~, which converts a finite +value into an integer. + +The final three AMPSL operators of note are ~merge~, ~slice~ and ~cut~. These +all perform operations on arrays, by either merging two together, taking out a +slice, or cutting out a slice. Unlike the pseudocode where bitstring slicing +requires a range, these three operators use Agda's dependent types and type +inference so that only a base offset is necessary. + +~slice xs i~, like bitstring slicing, extracts a contiguous subset of values +from an array ~xs~, such that the first element in ~slice xs i~ is in ~xs~ at +position ~i~. ~cut xs i~ returns the remainder of ~slice xs i~; the two ends of +~xs~ not in the slice, concatenated. Finally, ~merge xs ys i~ joins ~xs~ and +~ys~ to form a product-projection triple. + +The ~Reference~ type is the name AMPSL gives to assignable expressions from the +pseudocode. The ~LocalReference~ type is identical to ~Reference~, except it +does not include global variables. Due to complications to the semantics of +multiple assignments to one location, "product" operations like ~merge~ and +~cons~ are excluded from being references, despite concatenated bitstrings and +tuples being assignable expressions in the pseudocode. Whilst (FIXME: textcite) +requires that no position in a bitstring is referenced twice, enforcing this in +AMPSL for ~merge~ and ~cons~ would make their use unergonomic in practice for +writing code or proofs. + +(FIXME: necessary?) In an earlier form of AMPSL, instead of separate types for +assignable expressions which can and cannot assign to state, there were two +predicates. However, this required carrying around a proof that the predicate +holds with each assignment. Whilst the impacts on performance were unmeasured, +it made proving statements with assignable expressions significantly more +difficult. Thankfully, Agda is able to resolve overloaded data type constructors +without much difficulty, meaning the use of ~Reference~ and ~LocalReference~ in +AMPSL programs is transparent. + +**** Example AMPSL Expressions +One arithmetic operator used in the pseudocode is left shift. (FIXME: textcite) +explains how this can be encoded using other arithmetic operators in AMPSL, as +shown below: + +#+begin_src agda2 +_<<_ : Expression Σ Γ int → (n : ℕ) → Expression Σ Γ int +e << n = e * lit (ℤ.+ (2 ℕ.^ n)) +#+end_src + +This simple-looking expression has a lot of hidden complexity. First, consider +the type of the literal statement. The unary plus operation tells us that the +literal is an Agda integer. However, there are two AMPSL types with Agda +integers for literal values: ~int~ and ~real~. How does Agda correctly infer the +type? Recall that multiplication is polymorphic in AMPSL, with the result type +determined by implicit coercion. Agda knows that the multiplication must return +an ~int~, and that the first argument is also an ~int~, so it can infer that the +second multiplicand is an integer literal. + +Another pseudocode operation not yet described in AMPSL is integer slicing. Here +is an expression that slices a single bit from an integer, following the +procedure by [cite/t:@arm/DDI0553B.s §E1.3.3]: + +#+begin_src agda2 +getBit : ℕ → Expression Σ Γ int → Expression Σ Γ bit +getBit i x = + inv (x - ((x >> suc i) << suc i)