#+options: ':t *:t -:t ::t <:t H:4 \n:nil ^:t arch:headline author:t #+options: broken-links:nil c:nil creator:nil d:(not "LOGBOOK") date:t e:t #+options: email:nil f:t inline:t num:t p:nil pri:nil prop:nil stat:t tags:t #+options: tasks:t tex:t timestamp:t title:t toc:t todo:t |:t #+title: Semantics of an embedded vector architecture for formal verification of software #+date: \today #+author: Greg Brown #+email: greg.brown@cl.cam.ac.uk #+language: en-GB #+select_tags: export #+exclude_tags: noexport #+creator: Emacs 27.2 (Org mode 9.6) #+cite_export: biblatex #+bibliography: ./thesis.bib #+latex_class: thesis #+latex_class_options: [twoside,a4paper,11pt] #+latex_header: \usepackage[hyperref=true,url=true,backend=biber,natbib=true]{biblatex} #+latex_header: \usepackage[moderate]{savetrees} #+latex_header: \usepackage[a4paper]{geometry} #+latex_header: \usepackage{minted} #+latex_header: \usepackage{newunicodechar} #+latex_header: \usepackage{mathtools} #+latex_header: \usepackage{stmaryrd} #+latex_header: \usepackage[english=british]{csquotes} #+latex_compiler: pdflatex #+latex_header: \newunicodechar{Γ}{\ensuremath{\Gamma}} #+latex_header: \newunicodechar{Δ}{\ensuremath{\Delta}} #+latex_header: \newunicodechar{Κ}{\ensuremath{K}} #+latex_header: \newunicodechar{Σ}{\ensuremath{\Sigma}} #+latex_header: \newunicodechar{γ}{\ensuremath{\gamma}} #+latex_header: \newunicodechar{δ}{\ensuremath{\delta}} #+latex_header: \newunicodechar{λ}{\ensuremath{\lambda}} #+latex_header: \newunicodechar{σ}{\ensuremath{\sigma}} #+latex_header: \newunicodechar{ᵗ}{\ensuremath{^\texttt{t}}} #+latex_header: \newunicodechar{′}{\ensuremath{'}} #+latex_header: \newunicodechar{₁}{\ensuremath{_1}} #+latex_header: \newunicodechar{₂}{\ensuremath{_2}} #+latex_header: \newunicodechar{ₛ}{\ensuremath{_\texttt{s}}} #+latex_header: \newunicodechar{ₜ}{\ensuremath{_\texttt{t}}} #+latex_header: \newunicodechar{ℓ}{l} #+latex_header: \newunicodechar{ℕ}{\ensuremath{\mathbb{N}}} #+latex_header: \newunicodechar{ℚ}{\ensuremath{\mathbb{Q}}} #+latex_header: \newunicodechar{ℝ}{\ensuremath{\mathbb{R}}} #+latex_header: \newunicodechar{ℤ}{\ensuremath{\mathbb{Z}}} #+latex_header: \newunicodechar{⇒}{\ensuremath{\Rightarrow}} #+latex_header: \newunicodechar{∀}{\ensuremath{\forall}} #+latex_header: \newunicodechar{∃}{\ensuremath{\exists}} #+latex_header: \newunicodechar{∘}{\ensuremath{\circ}} #+latex_header: \newunicodechar{∙}{\ensuremath{\cdot}} #+latex_header: \newunicodechar{∧}{\ensuremath{\wedge}} #+latex_header: \newunicodechar{∨}{\ensuremath{\vee}} #+latex_header: \newunicodechar{∷}{\texttt{::}} #+latex_header: \newunicodechar{≈}{\ensuremath{\approx}} #+latex_header: \newunicodechar{≉}{\ensuremath{\not\approx}} #+latex_header: \newunicodechar{≔}{\ensuremath{\coloneqq}} #+latex_header: \newunicodechar{≟}{\ensuremath{\buildrel ?\over =}} #+latex_header: \newunicodechar{≡}{\ensuremath{\equiv}} #+latex_header: \newunicodechar{⊆}{\ensuremath{\subseteq}} #+latex_header: \newunicodechar{⊎}{\ensuremath{\uplus}} #+latex_header: \newunicodechar{⊤}{\ensuremath{\top}} #+latex_header: \newunicodechar{⊥}{\ensuremath{\bot}} #+latex_header: \newunicodechar{⌊}{\ensuremath{\lfloor}} #+latex_header: \newunicodechar{⌋}{\ensuremath{\rfloor}} #+latex_header: \newunicodechar{⟦}{\ensuremath{\llbracket}} #+latex_header: \newunicodechar{⟧}{\ensuremath{\rrbracket}} #+latex_header: \newunicodechar{⟶}{\ensuremath{\rightarrow}} #+latex_header: \newunicodechar{⦃}{\{\{} #+latex_header: \newunicodechar{⦄}{\}\}} #+latex_header: \newunicodechar{𝕀}{\ensuremath{\mathbb{I}}} #+latex_header: %TC:envir minted 1 other #+begin_abstract All good implementations of any algorithm should be correct and fast. To maximise performance some algorithms are written in hand-tuned assembly. This can introduce subtle bugs that invalidate correctness or other safety properties. Whilst tools exists to help formally verify these algorithms, none are designed to target the recent M-profile Vector Extension for the Armv8.1-M architecture. This work describes a denotational and Hoare logic semantics for the language used to specify the instructions, and attempts to use them to formally verify the correctness of hand-written assembly for cryptographic applications. #+end_abstract # Flip this all around. What am I doing? What is the stuff? Why is it hard? (Why # am I smart?) * Introduction # Merge these two paras In nearly all cases, the best implementation of an algorithm for a particular purpose is both correct and fast. If the implementation is not correct, then it does not implement the algorithm. If the implementation is slow, then resources are wasted executing this algorithm over performing other useful work. Ensuring implementations are both performant and correct is typically difficult. One case that proves particularly tricky is writing assembly code. # dubious claim Most modern handwritten assembly is for cryptographic algorithms, where it is vital that compilers do not introduce timing attacks or other side channels. Hand written assembly can also take advantage of microarchitectural differences to gain a slight performance benefit. Another domain that frequently deals with assembly code is the output of compilers. Optimising compilers take the semantic operations described by high-level code and attempt to produce the optimal machine code to perform those actions. In both of these cases, either humans or machines shuffle around instructions in an attempt to eek out the best performance from hardware. The resulting code warps the original algorithms in ways that make it difficult to determine whether the actions performed actually have the same result. The correctness of the assembly code comes into question. It is only possible to prove the assembly code is correct if there is a model describing how it should behave. Suitable mathematical models of the behaviour allow for the construction of formal proofs certifying the assembly code has the correct behaviour. Due to the size and complexity of different instruction set architectures (ISAs), formal verification of software in this manner makes use of a number of software tools. Formal verification software is well-developed for the x86-64 and Arm A-profile ISAs. For instance, Jasmin [cite:@10.1145/3133956.3134078] is a programming language which verifies that compiling to x86-64 preserves a set of user-defined safety properties. Another similar tool is CompCert [cite:@hal/01238879] which is a verified C compiler, ensuring that the program semantics are preserved during compilation. Unfortunately, little work has been done to formally verify software written for the Arm M-profile architecture. This architecture is designed for low-power and low-latency microcontrollers, which operate in resource-constrained environments. The goal of this project is to formally verify an assembly function for the Arm M-profile architecture. The specific algorithm [cite:@10.46586/tches.v2022.i1.482-505] is a hand-written highly-optimised assembly-code implementation of the number-theoretic transform, which is an important procedure for post-quantum cryptography. This work has made progress on two fronts. Most work has been spent developing an embedding and semantic model of the Armv8.1-M pseudocode specification language for the Agda programming language [cite:@10.1007/978-3-642-03359-9_6]. All instructions in the Armv8.1-M architecture are given a "precise description" [cite:@arm/DDI0553B.s §\(\texttt{I}_\texttt{RTDZ}\)] using the pseudocode, and by developing an embedding within Agda, it is possible to use its powerful type system to construct formal proofs of correctness. Progress has also been made on the formal algebraic verification of Barrett reduction, a subroutine of the NTT. Whilst there exists a paper proof of the correctness of Barrett reduction, a proof of Barrett reduction within Agda is necessary to be able to prove the correctness of the given implementation of the NTT. # Focus on contributions # Structure of the paper: # - [[Armv8.1-M Pseudocode Specification Language]] describes the Armv8.1-M pseudocode # specification language. This is an imperative programming language used by the # Armv8.1-M manual to describe the operation of the instructions. # - A simple recap of Hoare logic is given in [[Hoare Logic]]. This is the backbone of # one of the formal verification techniques used in this work. # - [[Formal Definition of MSL]] contains a description of MSL. This is a language # similar to the Armv8.1-M pseudocode embedded within the Agda programming # language. # - The denotational semantics and a Hoare logic semantics of MSL are detailed in # [[Semantics of MSL]]. Due to Agda's nature of being a dependently-typed language, # these are both given using Agda code. # - [[Application to Proofs]] describes the experience of using the Hoare logic and # denotational semantics of MSL to prove the correctness of some simple # routines, given in [[Proofs using Hoare Semantics]] and [[Proofs using Denotational # Semantics]] respectively. # - Formal verification of Barrett reduction, an important subroutine in the NTT, # is given in [[Proofs of Barrett Reduction]]. In particular, it proves that Barrett # reduction performs a modulo reduction and gives bounds on the size of the # result. # - Finally, [[Future Work]] describes the steps necessary to complete the formal # verification of the NTT, as well as listing some other directions this work # can be taken. * Background The Armv8.1-M pseudocode specification language is used to describe the operation of instructions in the Armv8-M instruction set [cite:@arm/DDI0553B.s §E1.1]. If the semantics of this pseudocode can be formalised, then it is possible to formulate proofs about the action of all Armv8-M instructions, and hence construct proofs about the algorithms written with them. The language is rather simple, being pseudocode used as a descriptive aid, but has some interesting design choices atypical of regular imperative programming languages. As the pseudocode is an imperative language, one useful proof system for it is Hoare logic. Hoare logic is a proof system driven by the syntax of a program, with most of the hard work of proofs being in the form of choosing suitable program invariants and solving simple logical implications [cite:@10.1145/363235.363259]. As the logic is syntax driven, proofs using Hoare logic are less impacted than other proof systems by the large number of loops used in Armv8-M instruction descriptions. Further, solving simple logical implications is a task well-suited to Agda and other proof assistants, making proofs even simpler to construct. ** Armv8.1-M Pseudocode Specification Language The Armv8.1-M pseudocode specification language is a strongly-typed imperative programming language [cite:@arm/DDI0553B.s §E1.2.1]. It has a first-order type system, a small set of operators and basic control flow, as you would find in most imperative languages. There are some interesting design choices atypical of regular imperative languages to better fulfil the requirements of being a descriptive aid over an executable language. Something common to nearly all imperative languages is the presence of a primitive type for Booleans. Other typical type constructors are tuples, structs, enumerations and fixed-length arrays. The first interesting type used by the pseudocode is mathematical integers as a primitive type. Most imperative languages use fixed-width integers for primitive types, with exact integers available through some library. This is because the performance benefits of using fixed-width integers in code far outweigh the risk of overflow. However, as the pseudocode is a descriptive aid, with no intention of being executed, it can use exact mathematical integers and eliminate overflow errors without any performance cost [cite:@arm/DDI0553B.s §E1.3.4]. Another such type present in the pseudocode is mathematical real numbers. As most real numbers are impossible to record using finite storage, any executable programming language must make compromises to the precision of real numbers. Because the pseudocode does not concern itself with being executable, it is free to use real numbers and have exact precision in real-number arithmetic [cite:@arm/DDI0553B.s §E1.2.4]. The final primitive type used by the pseudocode is the bitstring; a fixed-length sequence of 0s and 1s. Some readers may wonder what the difference is between this type and arrays of Booleans. The justification given by [cite/t:@arm/DDI0553B.s §E1.2.2] is more philosophical than practical: "bitstrings are the only concrete data type in pseudocode". In some places, bitstrings can be used instead of integers in arithmetic operations. Most of the operators used by the pseudocode are unsurprising. For instance, Booleans have the standard set of short-circuiting operations; integers and reals have addition, subtraction and multiplication; reals have division; and integers have integer division (division rounding to \(-\infty\)) and modulus (the remainder of division). By far the two most interesting operations in the pseudocode are bitstring concatenation and slicing. Bitstring concatenation is much like appending two arrays together, or regular string concatenation. Bitstring slicing is a more nuanced process. Slicing a bitstring by a single index is no different from a regular array access. If instead a bitstring is sliced by a range of integers, the result is the concatenation of each single-bit access. Finally, when integers are sliced instead of bitstring, the pseudocode "treats an integer as equivalent to a sufficiently long \textelp{} bitstring" [cite:@arm/DDI0553B.s §E1.3.3]. The final interesting difference between the pseudocode and most imperative languages is the variety of top-level items. The pseudocode has three forms of items: procedures, functions and array-like functions. Procedures and functions behave like procedures and functions of other imperative languages. The arguments to them are passed by value, and the only difference between the two is that procedures do not return values whilst functions do [cite:@arm/DDI0553B.s §E1.4.2]. Array-like functions act as getters and setters for machine state. Every array-like function has a reader form, and most have a writer form. This distinction exists because "reading from and writing to an array element require different functions", [cite:@arm/DDI0553B.s §E1.4.2], likely due to the nature of some machine registers being read-only instead of read-writeable. The writer form acts as one of the targets of assignment expressions, along with variables and the result of bitstring concatenation and slicing [cite:@arm/DDI0553B.s §E1.3.5]. ** Hoare Logic Hoare logic is a proof system for programs written in imperative programming languages. At its core, the logic describes how to build partial correctness triples, which describe how program statements affect assertions about machine state. The bulk of a Hoare logic derivation is dependent only on the syntax of the program the proof targets. A partial correctness triple is a relation between a precondition \(P\), a program statement \(s\) and a postcondition \(Q\). If \((P , s , Q)\) is a partial correctness triple, then whenever \(P\) holds for some machine state, then when executing \(s\), \(Q\) holds for the state after it terminates [cite:@10.1145/363235.363259]. Those last three words, "after it terminates", are what leads the relation being a /partial/ correctness triple. If all statements terminate, which we will see later, then this relation is called a correctness triple. Along with the syntactic rules for derivations, Hoare logic typically also features a number of adaptation rules. The most-widely known of these is the rule of consequence, which can strengthen the precondition and weaken the postcondition. This requires an additional logic for assertions. Typically, this is first-order or higher-order logic [cite:@10.1007/s00165-019-00501-3;@10.1007/s001650050057]. One vital feature of Hoare logic with regards to specification is auxiliary variables. These are variables that cannot be used by programs, hence remain constant between the precondition and postcondition [cite:@10.1007/s001650050057]. * The MIPSL Language The ultimate goal of this project is to formally verify an implementation of the number theoretic transform using the M-profile vector extension of the Armv8.1-M architecture. Work has proceeded on two fronts. This chapter discusses the first; a formal definition of a language similar to the Armv8-M pseudocode specification language, which I will call MIPSL (MIPSL imitates pseudocode specification language). This definition exists within Agda, allowing for the use of Agda's dependent-typing features to construct proofs about programs written in MIPSL. To construct proofs about how MIPSL behaves, it is necessary to describe its semantics. This is done through providing a denotational semantics. Denotational semantics interpret program expressions and statements as mathematical functions, something which Agda is well-suited to do. One downside of denotational semantics is that control flow for looping constructs is fully evaluated. This is inefficient for loops that undergo many iterations. This can be resolved by a syntax-directed Hoare logic for MIPSL. Hoare logic derivations assign a precondition and a postcondition assertion to each statement. These are chained together though a number of simple logical implications. ** Definition of MIPSL MIPSL is a language similar to the Armv8-M pseudocode specification language written entirely in Agda. Unfortunately, the pseudocode has a number of small features that make it difficult to work with in Agda directly. MIPSL makes a number of small changes to the pseudocode to better facilitate this embedding, typically generalising existing features of the pseudocode. *** MIPSL Types #+name: MIPSL-types #+caption: The Agda datatype representing the types present in MIPSL. Most have #+caption: a direct analogue in the Armv8-M pseudocode specification language #+attr_latex: :float t #+begin_src agda2 data Type : Set where bool : Type int : Type fin : (n : ℕ) → Type real : Type tuple : Vec Type n → Type array : Type → (n : ℕ) → Type #+end_src [[MIPSL-types]] gives the Agda datatype representing the types of MIPSL. Most of these have a direct analogue to the pseudocode types. For example, ~bool~ is a Boolean, ~int~ mathematical integers, ~real~ is for real numbers and ~array~ constructs array types. Instead of an enumeration construct, MIPSL uses the ~fin n~ type, representing a finite set of ~n~ elements. Similarly, structs are represented by ~tuple~ types. The most significant difference between the pseudocode and MIPSL is the representation of bitstrings. Whilst the pseudocode has the ~bits~ datatype, MIPSL instead treats bitstrings as an array of Booleans, demonstrated by this code fragment: #+begin_src agda2 bit : Type bit = bool bits : ℕ → Type bits = array bit #+end_src This removes the distinction between arrays and bitstrings, and allows a number of operations to be generalised to work on both types. This makes MIPSL more expressive than the pseudocode, in the sense that there are a greater number and more concise ways to write programs that are functionally equivalent. #+name: MIPSL-type-properties #+caption: Prototypes for the three predicates on types. #+attr_latex: :float t #+begin_src agda2 data HasEquality : Type → Set data Ordered : Type → Set data IsNumeric : Type → Set #+end_src The pseudocode implicitly specifies three different properties of types: equality comparisons, order comparisons and arithmetic operations. Whilst the types satisfying these properties need to be listed explicitly in Agda, using instance arguments allows for these proofs to be elided whenever they are required. [[MIPSL-type-properties]] gives the type signatures of the three predicates. MIPSL has only two differences in types that satisfy these properties compared to the pseudocode. First, all array types have equality as long as the enumerated type also has equality. This is a natural generalisation of the equality between types, and allows for the MIPSL formulation of bitstrings as arrays of Booleans to have equality. Secondly, finite sets also have ordering. This change is primarily a convenience feature for comparing finite representing a subset of integers. As the pseudocode has no ordering comparisons between enumerations, this causes no problems for converting pseudocode programs into MIPSL. #+name: MIPSL-type-addition #+caption: Definition of the "type addition" feature of MIPSL. #+attr_latex: :float t #+begin_src agda2 _+ᵗ_ : IsNumeric t₁ → IsNumeric t₂ → Type int +ᵗ int = int int +ᵗ real = real real +ᵗ t₂ = real #+end_src The final interesting feature of the types in MIPSL is "type addition". As pseudocode arithmetic is polymorphic for integers and reals, MIPSL needs a function to decide the type of the result. [[MIPSL-type-addition]] gives the definition of this function. *** MIPSL Expressions #+name: MIPSL-literalType #+caption: Mappings from MIPSL types into Agda types which can be used as #+caption: literal values. ~literalTypes~ is a function that returns a product #+caption: of the types given in the argument. #+begin_src agda literalType : Type → Set literalType bool = Bool literalType int = ℤ literalType (fin n) = Fin n literalType real = ℤ literalType (tuple ts) = literalTypes ts literalType (array t n) = Vec (literalType t) n #+end_src Unlike the pseudocode, where only a few types have literal expressions, every type in MIPSL has a literal form. This mapping is part of the ~literalType~ function, given in [[MIPSL-literalType]]. Most MIPSL literals accept the corresponding Agda type as a value. For instance, ~bool~ literals are Agda Booleans, and ~array~ literals are fixed-length Agda vectors of the corresponding underlying type. The only exception to this rule is for ~real~ values. As Agda does not have a type representing mathematical reals, integers are used instead. # TODO: why is this sufficient? #+name: MIPSL-expr-prototypes #+caption: Prototypes of the numerous MIPSL program elements. Each one takes two #+caption: variable contexts: ~Σ~ for global variables and ~Γ~ for local variables. #+attr_latex: :float t #+begin_src agda data Expression (Σ : Vec Type o) (Γ : Vec Type n) : Type → Set data Reference (Σ : Vec Type o) (Γ : Vec Type n) : Type → Set data LocalReference (Σ : Vec Type o) (Γ : Vec Type n) : Type → Set data Statement (Σ : Vec Type o) (Γ : Vec Type n) : Set data LocalStatement (Σ : Vec Type o) (Γ : Vec Type n) : Set data Function (Σ : Vec Type o) (Γ : Vec Type n) (ret : Type) : Set data Procedure (Σ : Vec Type o) (Γ : Vec Type n) : Set #+end_src One important benefit of using Agda to write MIPSL is that expressions and statements can have the MIPSL types of variables be part of the Agda type. This makes checking type safety of variable usage trivial. For reasons which will be discussed later, MIPSL uses two variable contexts, ~Σ~ for global variables and ~Γ~ for local variables. [[MIPSL-expr-prototypes]] lists the prototypes for the various MIPSL program elements. The full definitions of these types are given in [[*MIPSL Syntax Definition]]. An ~Expression~ in MIPSL corresponds with expressions in the pseudocode. Many operators are identical to those in the pseudocode (like ~+~, ~*~, ~-~), and others are simple renamings (like ~≟~ instead of ~==~ for equality comparisons). Unlike the pseudocode, where literals can appear unqualified, MIPSL literals are introduced by the ~lit~ constructor. The most immediate change for programming in MIPSL versus the pseudocode is how variables are handled. Because the ~Expression~ type carries fixed-length vectors listing the MIPSL types of variables, a variable is referred to by its index into the context. For example, a variable context \(\{x \mapsto \mathrm{int}, y \mapsto \mathrm{real}\}\) is represented in MIPSL as the context ~int ∷ real ∷ []~. The variable \(x\) is then represented by ~var 0F~ in MIPSL. Because the global and local variable contexts are disjoint for the ~Expression~ types, variables are constructed using ~state~ or ~var~ respectively. Whilst this decision introduces much complexity to programming using MIPSL, it greatly simplifies the language for use in constructing proofs. For instance, by referring to variables by their index in the variable context, variable shadowing, where two variables share a name, is completely eliminated from MIPSL. MIPSL expressions also add a number of useful constructs to the pseudocode type. One such pair is ~[_]~ and ~unbox~, which construct and destruct an array of length one respectively. Another pair are ~fin~, which allows for arbitrary computations on elements of finite sets, and ~asInt~, which converts a finite value into an integer. The three MIPSL operators most likely to cause confusion are ~merge~, ~slice~ and ~cut~. ~slice~ is analogous to bitstring slicing from the pseudocode, but has some important differences. Instead of operating on bitstrings and integers, ~slice~ acts on ~array~ types. Integer slicing can be expressed using the other operators of MIPSL, and in MIPSL a bitstring is a form of array. Another difference is that the number of bits to take from the string is not specified by a range, but is instead inferred from the output type. This is only possible due to the powerful type inference features of Agda. Finally, the user only needs to provide a single index into the array. By giving the index a ~fin~ datatype, and taking care with the length of arrays, MIPSL can guarantee that the array indices are always in bounds for the entire range that is required. The ~cut~ operation is an opposite to ~slice~. Whilst ~slice~ takes a contiguous range of elements from the middle of an array, ~cut~ returns the two ends of the array outside of the range. Finally, ~merge~ is a generalisation of concatenation that allows inserting one array into another. These three operations are part of an equivalence; for any arrays ~x~ and ~y~ and an appropriate index ~i~, we have the following equalities: - ~merge (slice x i) (cut x i) i ≟ x~ - ~slice (merge x y i) i ≟ x~ - ~cut (merge x y i) i ≟ y~ The ~Reference~ type is the name MIPSL gives to assignable expressions from the pseudocode. The ~LocalReference~ type is identical to ~Reference~, except it does not include global variables. In an earlier form of MIPSL, instead of separate types for assignable expressions which can and cannot assign to state, there were two predicates. However, this required carrying around a proof that the predicate holds with each assignment. Whilst the impacts on performance were unmeasured, it made proving statements with assignable expressions significantly more difficult. Thankfully, Agda is able to resolve overloaded data type constructors without much difficulty, meaning the use of ~Reference~ and ~LocalReference~ in MIPSL programs is transparent. **** Example MIPSL Expressions For those who have read [[*MIPSL Syntax Definition]], you may notice that whilst the pseudocode defines a left-shift operator, there is no such operator in MIPSL. Left shift can be defined in the following way: #+begin_src agda2 _<<_ : Expression Σ Γ int → (n : ℕ) → Expression Σ Γ int e << n = e * lit (ℤ.+ (2 ℕ.^ n)) #+end_src This simple-looking expression has a lot of hidden complexity. First, consider the type of the literal statement. The unary plus operation tells us that the literal is an Agda integer. However, there are two MIPSL types with Agda integers for literal values: ~int~ and ~real~. How does Agda correctly infer the type? Recall that multiplication is polymorphic in MIPSL, with the result type determined by "type addition" ([[MIPSL-type-addition]]). Agda knows that the multiplication must return an ~int~, and that the first argument is also an ~int~, so by following the definition of ~_+ᵗ_~, it can infer that the second multiplicand is an integer literal. The previous section claimed that it is possible to define integer slicing entirely using the other operations in MIPSL. Here is an expression that slices a single bit from an integer, following the procedure by [cite/t:@arm/DDI0553B.s §E1.3.3]: #+begin_src agda2 getBit : ℕ → Expression Σ Γ int → Expression Σ Γ bit getBit i x = if x - x >> suc i << suc i >_ : Expression Σ Γ int → (n : ℕ) → Expression Σ Γ int rnd : Expression Σ Γ real → Expression Σ Γ int fin : ∀ {ms} → (f : literalTypes (map fin ms) → Fin n) → Expression Σ Γ (tuple {n = k} (map fin ms)) → Expression Σ Γ (fin n) asInt : Expression Σ Γ (fin n) → Expression Σ Γ int nil : Expression Σ Γ (tuple []) cons : Expression Σ Γ t → Expression Σ Γ (tuple ts) → Expression Σ Γ (tuple (t ∷ ts)) head : Expression Σ Γ (tuple (t ∷ ts)) → Expression Σ Γ t tail : Expression Σ Γ (tuple (t ∷ ts)) → Expression Σ Γ (tuple ts) call : (f : Function Σ Δ t) → All (Expression Σ Γ) Δ → Expression Σ Γ t if_then_else_ : Expression Σ Γ bool → Expression Σ Γ t → Expression Σ Γ t → Expression Σ Γ t data Reference Σ Γ where state : ∀ i → Reference Σ Γ (lookup Σ i) var : ∀ i → Reference Σ Γ (lookup Γ i) [_] : Reference Σ Γ t → Reference Σ Γ (array t 1) unbox : Reference Σ Γ (array t 1) → Reference Σ Γ t merge : Reference Σ Γ (array t m) → Reference Σ Γ (array t n) → Expression Σ Γ (fin (suc n)) → Reference Σ Γ (array t (n ℕ.+ m)) slice : Reference Σ Γ (array t (n ℕ.+ m)) → Expression Σ Γ (fin (suc n)) → Reference Σ Γ (array t m) cut : Reference Σ Γ (array t (n ℕ.+ m)) → Expression Σ Γ (fin (suc n)) → Reference Σ Γ (array t n) cast : .(eq : m ≡ n) → Reference Σ Γ (array t m) → Reference Σ Γ (array t n) nil : Reference Σ Γ (tuple []) cons : Reference Σ Γ t → Reference Σ Γ (tuple ts) → Reference Σ Γ (tuple (t ∷ ts)) head : Reference Σ Γ (tuple (t ∷ ts)) → Reference Σ Γ t tail : Reference Σ Γ (tuple (t ∷ ts)) → Reference Σ Γ (tuple ts) data LocalReference Σ Γ where var : ∀ i → LocalReference Σ Γ (lookup Γ i) [_] : LocalReference Σ Γ t → LocalReference Σ Γ (array t 1) unbox : LocalReference Σ Γ (array t 1) → LocalReference Σ Γ t merge : LocalReference Σ Γ (array t m) → LocalReference Σ Γ (array t n) → Expression Σ Γ (fin (suc n)) → LocalReference Σ Γ (array t (n ℕ.+ m)) slice : LocalReference Σ Γ (array t (n ℕ.+ m)) → Expression Σ Γ (fin (suc n)) → LocalReference Σ Γ (array t m) cut : LocalReference Σ Γ (array t (n ℕ.+ m)) → Expression Σ Γ (fin (suc n)) → LocalReference Σ Γ (array t n) cast : .(eq : m ≡ n) → LocalReference Σ Γ (array t m) → LocalReference Σ Γ (array t n) nil : LocalReference Σ Γ (tuple []) cons : LocalReference Σ Γ t → LocalReference Σ Γ (tuple ts) → LocalReference Σ Γ (tuple (t ∷ ts)) head : LocalReference Σ Γ (tuple (t ∷ ts)) → LocalReference Σ Γ t tail : LocalReference Σ Γ (tuple (t ∷ ts)) → LocalReference Σ Γ (tuple ts) data Statement Σ Γ where _∙_ : Statement Σ Γ → Statement Σ Γ → Statement Σ Γ skip : Statement Σ Γ _≔_ : Reference Σ Γ t → Expression Σ Γ t → Statement Σ Γ declare : Expression Σ Γ t → Statement Σ (t ∷ Γ) → Statement Σ Γ invoke : (f : Procedure Σ Δ) → All (Expression Σ Γ) Δ → Statement Σ Γ for : ∀ n → Statement Σ (fin n ∷ Γ) → Statement Σ Γ if_then_ : Expression Σ Γ bool → Statement Σ Γ → Statement Σ Γ if_then_else_ : Expression Σ Γ bool → Statement Σ Γ → Statement Σ Γ → Statement Σ Γ data LocalStatement Σ Γ where _∙_ : LocalStatement Σ Γ → LocalStatement Σ Γ → LocalStatement Σ Γ skip : LocalStatement Σ Γ _≔_ : LocalReference Σ Γ t → Expression Σ Γ t → LocalStatement Σ Γ declare : Expression Σ Γ t → LocalStatement Σ (t ∷ Γ) → LocalStatement Σ Γ for : ∀ n → LocalStatement Σ (fin n ∷ Γ) → LocalStatement Σ Γ if_then_ : Expression Σ Γ bool → LocalStatement Σ Γ → LocalStatement Σ Γ if_then_else_ : Expression Σ Γ bool → LocalStatement Σ Γ → LocalStatement Σ Γ → LocalStatement Σ Γ data Function Σ Γ ret where init_∙_end : Expression Σ Γ ret → LocalStatement Σ (ret ∷ Γ) → Function Σ Γ ret data Procedure Σ Γ where _∙end : Statement Σ Γ → Procedure Σ Γ #+end_src * MIPSL Denotational Semantics #+begin_src agda2 expr : Expression Σ Γ t → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ t ⟧ₜ exprs : All (Expression Σ Γ) ts → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ ts ⟧ₜₛ ref : Reference Σ Γ t → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ t ⟧ₜ locRef : LocalReference Σ Γ t → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ t ⟧ₜ stmt : Statement Σ Γ → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ locStmt : LocalStatement Σ Γ → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ Γ ⟧ₜₛ fun : Function Σ Γ t → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ t ⟧ₜ proc : Procedure Σ Γ → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ Σ ⟧ₜₛ assign : Reference Σ Γ t → ⟦ t ⟧ₜ → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ locAssign : LocalReference Σ Γ t → ⟦ t ⟧ₜ → ⟦ Σ ⟧ₜₛ × ⟦ Γ ⟧ₜₛ → ⟦ Γ ⟧ₜₛ → ⟦ Γ ⟧ₜₛ expr (lit {t = t} x) = const (Κ[ t ] x) expr {Σ = Σ} (state i) = fetch i Σ ∘ proj₁ expr {Γ = Γ} (var i) = fetch i Γ ∘ proj₂ expr (e ≟ e₁) = lift ∘ does ∘ uncurry ≈-dec ∘ < expr e , expr e₁ > expr (e expr (inv e) = lift ∘ Bool.not ∘ lower ∘ expr e expr (e && e₁) = lift ∘ uncurry (Bool._∧_ on lower) ∘ < expr e , expr e₁ > expr (e || e₁) = lift ∘ uncurry (Bool._∨_ on lower) ∘ < expr e , expr e₁ > expr (not e) = map (lift ∘ Bool.not ∘ lower) ∘ expr e expr (e and e₁) = uncurry (zipWith (lift ∘₂ Bool._∧_ on lower)) ∘ < expr e , expr e₁ > expr (e or e₁) = uncurry (zipWith (lift ∘₂ Bool._∨_ on lower)) ∘ < expr e , expr e₁ > expr [ e ] = (_∷ []) ∘ expr e expr (unbox e) = Vec.head ∘ expr e expr (merge e e₁ e₂) = uncurry (uncurry mergeVec) ∘ < < expr e , expr e₁ > , lower ∘ expr e₂ > expr (slice e e₁) = uncurry sliceVec ∘ < expr e , lower ∘ expr e₁ > expr (cut e e₁) = uncurry cutVec ∘ < expr e , lower ∘ expr e₁ > expr (cast eq e) = castVec eq ∘ expr e expr (- e) = neg ∘ expr e expr (e + e₁) = uncurry add ∘ < expr e , expr e₁ > expr (e * e₁) = uncurry mul ∘ < expr e , expr e₁ > expr (e ^ x) = flip pow x ∘ expr e expr (e >> n) = lift ∘ flip shift n ∘ lower ∘ expr e expr (rnd e) = lift ∘ ⌊_⌋ ∘ lower ∘ expr e expr (fin {ms = ms} f e) = lift ∘ f ∘ lowerFin ms ∘ expr e expr (asInt e) = lift ∘ 𝕀⇒ℤ ∘ 𝕀.+_ ∘ Fin.toℕ ∘ lower ∘ expr e expr nil = const _ expr (cons {ts = ts} e e₁) = uncurry (cons′ ts) ∘ < expr e , expr e₁ > expr (head {ts = ts} e) = head′ ts ∘ expr e expr (tail {ts = ts} e) = tail′ ts ∘ expr e expr (call f es) = fun f ∘ < proj₁ , exprs es > expr (if e then e₁ else e₂) = uncurry (uncurry Bool.if_then_else_) ∘ < < lower ∘ expr e , expr e₁ > , expr e₂ > exprs [] = const _ exprs (e ∷ []) = expr e exprs (e ∷ e₁ ∷ es) = < expr e , exprs (e₁ ∷ es) > ref {Σ = Σ} (state i) = fetch i Σ ∘ proj₁ ref {Γ = Γ} (var i) = fetch i Γ ∘ proj₂ ref [ r ] = (_∷ []) ∘ ref r ref (unbox r) = Vec.head ∘ ref r ref (merge r r₁ e) = uncurry (uncurry mergeVec) ∘ < < ref r , ref r₁ > , lower ∘ expr e > ref (slice r e) = uncurry sliceVec ∘ < ref r , lower ∘ expr e > ref (cut r e) = uncurry cutVec ∘ < ref r , lower ∘ expr e > ref (cast eq r) = castVec eq ∘ ref r ref nil = const _ ref (cons {ts = ts} r r₁) = uncurry (cons′ ts) ∘ < ref r , ref r₁ > ref (head {ts = ts} r) = head′ ts ∘ ref r ref (tail {ts = ts} r) = tail′ ts ∘ ref r locRef {Γ = Γ} (var i) = fetch i Γ ∘ proj₂ locRef [ r ] = (_∷ []) ∘ locRef r locRef (unbox r) = Vec.head ∘ locRef r locRef (merge r r₁ e) = uncurry (uncurry mergeVec) ∘ < < locRef r , locRef r₁ > , lower ∘ expr e > locRef (slice r e) = uncurry sliceVec ∘ < locRef r , lower ∘ expr e > locRef (cut r e) = uncurry cutVec ∘ < locRef r , lower ∘ expr e > locRef (cast eq r) = castVec eq ∘ locRef r locRef nil = const _ locRef (cons {ts = ts} r r₁) = uncurry (cons′ ts) ∘ < locRef r , locRef r₁ > locRef (head {ts = ts} r) = head′ ts ∘ locRef r locRef (tail {ts = ts} r) = tail′ ts ∘ locRef r assign {Σ = Σ} (state i) val σ,γ = < updateAt i Σ val ∘ proj₁ , proj₂ > assign {Γ = Γ} (var i) val σ,γ = < proj₁ , updateAt i Γ val ∘ proj₂ > assign [ r ] val σ,γ = assign r (Vec.head val) σ,γ assign (unbox r) val σ,γ = assign r (val ∷ []) σ,γ assign (merge r r₁ e) val σ,γ = let i = expr e σ,γ in assign r₁ (cutVec val (lower i)) σ,γ ∘ assign r (sliceVec val (lower i)) σ,γ assign (slice r e) val σ,γ = let i = expr e σ,γ in let cut = cutVec (ref r σ,γ) (lower i) in assign r (mergeVec val cut (lower i)) σ,γ assign (cut r e) val σ,γ = let i = expr e σ,γ in let slice = sliceVec (ref r σ,γ) (lower i) in assign r (mergeVec slice val (lower i)) σ,γ assign (cast eq r) val σ,γ = assign r (castVec (sym eq) val) σ,γ assign nil val σ,γ = id assign (cons {ts = ts} r r₁) val σ,γ = assign r₁ (tail′ ts val) σ,γ ∘ assign r (head′ ts val) σ,γ assign (head {ts = ts} r) val σ,γ = assign r (cons′ ts val (ref (tail r) σ,γ)) σ,γ assign (tail {ts = ts} r) val σ,γ = assign r (cons′ ts (ref (head r) σ,γ) val) σ,γ locAssign {Γ = Γ} (var i) val σ,γ = updateAt i Γ val locAssign [ r ] val σ,γ = locAssign r (Vec.head val) σ,γ locAssign (unbox r) val σ,γ = locAssign r (val ∷ []) σ,γ locAssign (merge r r₁ e) val σ,γ = let i = expr e σ,γ in locAssign r₁ (cutVec val (lower i)) σ,γ ∘ locAssign r (sliceVec val (lower i)) σ,γ locAssign (slice r e) val σ,γ = let i = expr e σ,γ in let cut = cutVec (ref r σ,γ) (lower i) in locAssign r (mergeVec val cut (lower i)) σ,γ locAssign (cut r e) val σ,γ = let i = expr e σ,γ in let slice = sliceVec (ref r σ,γ) (lower i) in locAssign r (mergeVec slice val (lower i)) σ,γ locAssign (cast eq r) val σ,γ = locAssign r (castVec (sym eq) val) σ,γ locAssign nil val σ,γ = id locAssign (cons {ts = ts} r r₁) val σ,γ = locAssign r₁ (tail′ ts val) σ,γ ∘ locAssign r (head′ ts val) σ,γ locAssign (head {ts = ts} r) val σ,γ = locAssign r (cons′ ts val (locRef (tail r) σ,γ)) σ,γ locAssign (tail {ts = ts} r) val σ,γ = locAssign r (cons′ ts (locRef (head r) σ,γ) val) σ,γ stmt (s ∙ s₁) = stmt s₁ ∘ stmt s stmt skip = id stmt (ref ≔ val) = uncurry (uncurry (assign ref)) ∘ < < expr val , id > , id > stmt {Γ = Γ} (declare e s) = < proj₁ , tail′ Γ ∘ proj₂ > ∘ stmt s ∘ < proj₁ , uncurry (cons′ Γ) ∘ < expr e , proj₂ > > stmt (invoke p es) = < proc p ∘ < proj₁ , exprs es > , proj₂ > stmt (if e then s) = uncurry (uncurry Bool.if_then_else_) ∘ < < lower ∘ expr e , stmt s > , id > stmt (if e then s else s₁) = uncurry (uncurry Bool.if_then_else_) ∘ < < lower ∘ expr e , stmt s > , stmt s₁ > stmt {Γ = Γ} (for m s) = Vec.foldl _ (flip λ i → ( < proj₁ , tail′ Γ ∘ proj₂ > ∘ stmt s ∘ < proj₁ , cons′ Γ (lift i) ∘ proj₂ >) ∘_) id (Vec.allFin m) locStmt (s ∙ s₁) = locStmt s₁ ∘ < proj₁ , locStmt s > locStmt skip = proj₂ locStmt (ref ≔ val) = uncurry (uncurry (locAssign ref)) ∘ < < expr val , id > , proj₂ > locStmt {Γ = Γ} (declare e s) = tail′ Γ ∘ locStmt s ∘ < proj₁ , uncurry (cons′ Γ) ∘ < expr e , proj₂ > > locStmt (if e then s) = uncurry (uncurry Bool.if_then_else_) ∘ < < lower ∘ expr e , locStmt s > , proj₂ > locStmt (if e then s else s₁) = uncurry (uncurry Bool.if_then_else_) ∘ < < lower ∘ expr e , locStmt s > , locStmt s₁ > locStmt {Γ = Γ} (for m s) = proj₂ ∘ Vec.foldl _ (flip λ i → ( < proj₁ , tail′ Γ ∘ locStmt s > ∘ < proj₁ , cons′ Γ (lift i) ∘ proj₂ >) ∘_) id (Vec.allFin m) fun {Γ = Γ} (init e ∙ s end) = fetch zero (_ ∷ Γ) ∘ locStmt s ∘ < proj₁ , uncurry (cons′ Γ) ∘ < expr e , proj₂ > > proc (s ∙end) = proj₁ ∘ stmt s #+end_src # LocalWords: Hoare ISAs Jasmin CompCert structs bitstring bitstrings getters MIPSL #+latex: %TC:endignore * MIPSL Hoare Logic Definitions First, the definition and semantics of ~Term~. #+begin_src agda2 data Term (Σ : Vec Type i) (Γ : Vec Type j) (Δ : Vec Type k) : Type → Set ℓ where lit : ⟦ t ⟧ₜ → Term Σ Γ Δ t state : ∀ i → Term Σ Γ Δ (lookup Σ i) var : ∀ i → Term Σ Γ Δ (lookup Γ i) meta : ∀ i → Term Σ Γ Δ (lookup Δ i) _≟_ : ⦃ HasEquality t ⦄ → Term Σ Γ Δ t → Term Σ Γ Δ t → Term Σ Γ Δ bool _>_ : Term Σ Γ Δ int → (n : ℕ) → Term Σ Γ Δ int rnd : Term Σ Γ Δ real → Term Σ Γ Δ int fin : ∀ {ms} → (f : literalTypes (map fin ms) → Fin n) → Term Σ Γ Δ (tuple {n = o} (map fin ms)) → Term Σ Γ Δ (fin n) asInt : Term Σ Γ Δ (fin n) → Term Σ Γ Δ int nil : Term Σ Γ Δ (tuple []) cons : Term Σ Γ Δ t → Term Σ Γ Δ (tuple ts) → Term Σ Γ Δ (tuple (t ∷ ts)) head : Term Σ Γ Δ (tuple (t ∷ ts)) → Term Σ Γ Δ t tail : Term Σ Γ Δ (tuple (t ∷ ts)) → Term Σ Γ Δ (tuple ts) call : Function ts ts₁ t → All (Term Σ Γ Δ) ts → All (Term Σ Γ Δ) ts₁ → Term Σ Γ Δ t if_then_else_ : Term Σ Γ Δ bool → Term Σ Γ Δ t → Term Σ Γ Δ t → Term Σ Γ Δ t ⟦_⟧ : Term Σ Γ Δ t → ⟦ Σ ⟧ₜₛ → ⟦ Γ ⟧ₜₛ → ⟦ Δ ⟧ₜₛ → ⟦ t ⟧ₜ ⟦_⟧ₛ : All (Term Σ Γ Δ) ts → ⟦ Σ ⟧ₜₛ → ⟦ Γ ⟧ₜₛ → ⟦ Δ ⟧ₜₛ → ⟦ ts ⟧ₜₛ ⟦ lit x ⟧ σ γ δ = x ⟦ state i ⟧ σ γ δ = fetch i Σ σ ⟦ var i ⟧ σ γ δ = fetch i Γ γ ⟦ meta i ⟧ σ γ δ = fetch i Δ δ ⟦ e ≟ e₁ ⟧ σ γ δ = (lift ∘₂ does ∘₂ ≈-dec) (⟦ e ⟧ σ γ δ) (⟦ e₁ ⟧ σ γ δ) ⟦ e > n ⟧ σ γ δ = lift ∘ flip shift n ∘ lower $ ⟦ e ⟧ σ γ δ ⟦ rnd e ⟧ σ γ δ = lift ∘ ⌊_⌋ ∘ lower $ ⟦ e ⟧ σ γ δ ⟦ fin {ms = ms} f e ⟧ σ γ δ = lift ∘ f ∘ lowerFin ms $ ⟦ e ⟧ σ γ δ ⟦ asInt e ⟧ σ γ δ = lift ∘ 𝕀⇒ℤ ∘ 𝕀.+_ ∘ Fin.toℕ ∘ lower $ ⟦ e ⟧ σ γ δ ⟦ nil ⟧ σ γ δ = _ ⟦ cons {ts = ts} e e₁ ⟧ σ γ δ = cons′ ts (⟦ e ⟧ σ γ δ) (⟦ e₁ ⟧ σ γ δ) ⟦ head {ts = ts} e ⟧ σ γ δ = head′ ts (⟦ e ⟧ σ γ δ) ⟦ tail {ts = ts} e ⟧ σ γ δ = tail′ ts (⟦ e ⟧ σ γ δ) ⟦ call f es es₁ ⟧ σ γ δ = Den.Semantics.fun f (⟦ es ⟧ₛ σ γ δ , ⟦ es₁ ⟧ₛ σ γ δ) ⟦ if e then e₁ else e₂ ⟧ σ γ δ = Bool.if lower (⟦ e ⟧ σ γ δ) then ⟦ e₁ ⟧ σ γ δ else ⟦ e₂ ⟧ σ γ δ ⟦_⟧ₛ [] σ γ δ = _ ⟦_⟧ₛ {ts = _ ∷ ts} (e ∷ es) σ γ δ = cons′ ts (⟦ e ⟧ σ γ δ) (⟦ es ⟧ₛ σ γ δ) #+end_src And here is the definition of ~Assertion~ with its semantics. #+begin_src agda2 data Assertion (Σ : Vec Type o) (Γ : Vec Type n) (Δ : Vec Type m) : Set (ℓsuc ℓ) where all : Assertion Σ Γ (t ∷ Δ) → Assertion Σ Γ Δ some : Assertion Σ Γ (t ∷ Δ) → Assertion Σ Γ Δ pred : Term Σ Γ Δ bool → Assertion Σ Γ Δ true : Assertion Σ Γ Δ false : Assertion Σ Γ Δ ¬_ : Assertion Σ Γ Δ → Assertion Σ Γ Δ _∧_ : Assertion Σ Γ Δ → Assertion Σ Γ Δ → Assertion Σ Γ Δ _∨_ : Assertion Σ Γ Δ → Assertion Σ Γ Δ → Assertion Σ Γ Δ _⟶_ : Assertion Σ Γ Δ → Assertion Σ Γ Δ → Assertion Σ Γ Δ ⟦_⟧ : Assertion Σ Γ Δ → ⟦ Σ ⟧ₜₛ → ⟦ Γ ⟧ₜₛ → ⟦ Δ ⟧ₜₛ → Set ℓ ⟦_⟧ {Δ = Δ} (all P) σ γ δ = ∀ x → ⟦ P ⟧ σ γ (cons′ Δ x δ) ⟦_⟧ {Δ = Δ} (some P) σ γ δ = ∃ λ x → ⟦ P ⟧ σ γ (cons′ Δ x δ) ⟦ pred p ⟧ σ γ δ = Lift ℓ (Bool.T (lower (TS.⟦ p ⟧ σ γ δ))) ⟦ true ⟧ σ γ δ = Lift ℓ ⊤ ⟦ false ⟧ σ γ δ = Lift ℓ ⊥ ⟦ ¬ P ⟧ σ γ δ = ⟦ P ⟧ σ γ δ → ⊥ ⟦ P ∧ Q ⟧ σ γ δ = ⟦ P ⟧ σ γ δ × ⟦ Q ⟧ σ γ δ ⟦ P ∨ Q ⟧ σ γ δ = ⟦ P ⟧ σ γ δ ⊎ ⟦ Q ⟧ σ γ δ ⟦ P ⟶ Q ⟧ σ γ δ = ⟦ P ⟧ σ γ δ → ⟦ Q ⟧ σ γ δ #+end_src