#+options: ':t *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline author:t
#+options: broken-links:nil c:nil creator:nil d:(not "LOGBOOK") date:t e:t
#+options: email:nil f:t inline:t num:t p:nil pri:nil prop:nil stat:t tags:t
#+options: tasks:t tex:t timestamp:t title:t toc:nil todo:t |:t
#+title: Semantics of an embedded vector architecture for formal verification of software
#+date: \today
#+author: Greg Brown
#+email: greg.brown@cl.cam.ac.uk
#+language: en-GB
#+select_tags: export
#+exclude_tags: noexport
#+creator: Emacs 27.2 (Org mode 9.6)
#+cite_export: biblatex
#+bibliography: ./thesis.bib
#+latex_class: article
#+latex_class_options: [twoside,a4paper]
#+latex_header: \usepackage[hyperref=true,url=true,backend=biber,natbib=true]{biblatex}
#+latex_header: \usepackage[autostyle,english=british]{csquotes}
#+latex_header: \usepackage[moderate]{savetrees}
#+latex_header: \usepackage[a4paper]{geometry}
#+latex_compiler: pdflatex

#+begin_abstract
All good implementations of any algorithm should be correct and fast. To
maximise performance some algorithms are written in hand-tuned assembly. This
can introduce subtle bugs that invalidate correctness or other safety
properties. Whilst tools exists to help formally verify these algorithms, none
are designed to target the recent M-profile Vector Extension for the Armv8.1-M
architecture. This work describes a denotational and Hoare logic semantics for
the language used to specify the instructions, and attempts to use them to
formally verify the correctness of hand-written assembly for cryptographic
applications.
#+end_abstract

# Flip this all around. What am I doing? What is the stuff? Why is it hard? (Why
# am I smart?)
* Introduction

# Merge these two paras
In nearly all cases, the best implementation of an algorithm for a particular
purpose is both correct and fast. If the implementation is not correct, then it
does not implement the algorithm. If the implementation is slow, then resources are
wasted executing this algorithm over performing other useful work. Ensuring
implementations are both performant and correct is typically difficult.

One case that proves particularly tricky is writing assembly code.
# dubious claim
Most modern handwritten assembly is for cryptographic algorithms, where it is
vital that compilers do not introduce timing attacks or other side channels.
Hand written assembly can also take advantage of microarchitectural differences
to gain a slight performance benefit.

Another domain that frequently deals with assembly code is the output of
compilers. Optimising compilers take the semantic operations described by
high-level code and attempt to produce the optimal machine code to perform those
actions.

In both of these cases, either humans or machines shuffle around instructions in
an attempt to eek out the best performance from hardware. The resulting code
warps the original algorithms in ways that make it difficult to determine
whether the actions performed actually have the same result. The correctness of
the assembly code comes into question.

It is only possible to prove the assembly code is correct if there is a model
describing how it should behave. Suitable mathematical models of the behaviour
allow for the construction of formal proofs certifying the assembly code has the
correct behaviour. Due to the size and complexity of different instruction set
architectures (ISAs), formal verification of software in this manner makes use
of a number of software tools.

Formal verification software is well-developed for the x86-64 and Arm A-profile
ISAs. For instance, Jasmin [cite:@10.1145/3133956.3134078] is a programming
language which verifies that compiling to x86-64 preserves a set of user-defined
safety properties. Another similar tool is CompCert [cite:@hal/01238879] which
is a verified C compiler, ensuring that the program semantics are preserved
during compilation.

Unfortunately, little work has been done to formally verify software written for
the Arm M-profile architecture. This architecture is designed for low-power and
low-latency microcontrollers, which operate in resource-constrained
environments.

The goal of this project is to formally verify an assembly function for the Arm
M-profile architecture. The specific algorithm
[cite:@10.46586/tches.v2022.i1.482-505] is a hand-written highly-optimised
assembly-code implementation of the number-theoretic transform, which is an
important procedure for post-quantum cryptography.

This work has made progress on two fronts. Most work has been spent developing
an embedding and semantic model of the Armv8.1-M pseudocode specification
language for the Agda programming language [cite:@10.1007/978-3-642-03359-9_6].
All instructions in the Armv8.1-M architecture are given a "precise description"
[cite:@arm/DDI0553B.s § \(\texttt{I}_\texttt{RTDZ}\)] using the pseudocode, and
by developing an embedding within Agda, it is possible to use its powerful type
system to construct formal proofs of correctness.

Progress has also been made on the formal algebraic verification of Barrett
reduction, a subroutine of the NTT.  Whilst there exists a paper proof of the
correctness of Barrett reduction, a proof of Barrett reduction within Agda is
necessary to be able to prove the correctness of the given implementation of the
NTT.

# Focus on contributions
Structure of the paper:
- [[Armv8.1-M Pseudocode Specification Language]] describes the Armv8.1-M pseudocode
  specification language. This is an imperative programming language used by the
  Armv8.1-M manual to describe the operation of the instructions.
- A simple recap of Hoare logic is given in [[Hoare Logic]]. This is the backbone of
  one of the formal verification techniques used in this work.
- [[Formal Definition of MSL]] contains a description of MSL. This is a language
  similar to the Armv8.1-M pseudocode embedded within the Agda programming
  language.
- The denotational semantics and a Hoare logic semantics of MSL are detailed in
  [[Semantics of MSL]]. Due to Agda's nature of being a dependently-typed language,
  these are both given using Agda code.
- [[Application to Proofs]] describes the experience of using the Hoare logic and
  denotational semantics of MSL to prove the correctness of some simple
  routines, given in [[Proofs using Hoare Semantics]] and [[Proofs using Denotational
  Semantics]] respectively.
- Formal verification of Barrett reduction, an important subroutine in the NTT,
  is given in [[Proofs of Barrett Reduction]]. In particular, it proves that Barrett
  reduction performs a modulo reduction and gives bounds on the size of the
  result.
- Finally, [[Future Work]] describes the steps necessary to complete the formal
  verification of the NTT, as well as listing some other directions this work
  can be taken.

* Background
The Armv8.1-M pseudocode specification language is used to describe the
operation of instructions in the Armv8-M instruction set [cite:@arm/DDI0553B.s §
E1.1]. If the semantics of this pseudocode can be formalised, then it is
possible to formulate proofs about the action of all Armv8-M instructions, and
hence construct proofs about the algorithms written with them. The language is
rather simple, being pseudocode used as a descriptive aid, but has some
interesting design choices atypical of regular imperative programming languages.

As the pseudocode is an imperative language, one useful proof system for it is
Hoare logic. Hoare logic is a proof system driven by the syntax of a program,
with most of the hard work of proofs being in the form of choosing suitable
program invariants and solving simple logical implications
[cite:@10.1145/363235.363259]. As the logic is syntax driven, proofs using Hoare
logic are less impacted than other proof systems by the large number of loops
used in Armv8-M instruction descriptions.  Further, solving simple logical
implications is a task well-suited to Agda and other proof assistants, making
proofs even simpler to construct.

** Armv8.1-M Pseudocode Specification Language
The Armv8.1-M pseudocode specification language is a strongly-typed imperative
programming language [cite:@arm/DDI0553B.s § E1.2.1]. It has a first-order type
system, a small set of operators and basic control flow, as you would find in
most imperative languages. There are some interesting design choices atypical of
regular imperative languages to better fulfil the requirements of being a
descriptive aid over an executable language.

Something common to nearly all imperative languages is the presence of a
primitive type for Booleans. Other typical type constructors are tuples,
structs, enumerations and fixed-length arrays. The first interesting type used
by the pseudocode is mathematical integers as a primitive type. Most imperative
languages use fixed-width integers for primitive types, with exact integers
available through some library. This is because the performance benefits of
using fixed-width integers in code far outweigh the risk of overflow. However,
as the pseudocode is a descriptive aid, with no intention of being executed, it
can use exact mathematical integers and eliminate overflow errors without any
performance cost [cite:@arm/DDI0553B.s § E1.3.4].

Another such type present in the pseudocode is mathematical real numbers. As
most real numbers are impossible to record using finite storage, any executable
programming language must make compromises to the precision of real numbers.
Because the pseudocode does not concern itself with being executable, it is free
to use real numbers and have exact precision in real-number arithmetic
[cite:@arm/DDI0553B.s § E1.2.4].

The final primitive type used by the pseudocode is the bitstring; a fixed-length
sequence of 0s and 1s. Some readers may wonder what the difference is between
this type and arrays of Booleans. The justification given by
[cite/t:@arm/DDI0553B.s § E1.2.2] is more philosophical than practical:
"bitstrings are the only concrete data type in pseudocode". In some places,
bitstrings can be used instead of integers in arithmetic operations.

Most of the operators used by the pseudocode are unsurprising. For instance,
Booleans have the standard set of short-circuiting operations; integers and
reals have addition, subtraction and multiplication; reals have division; and
integers have integer division (division rounding to \(-\infty\)) and modulus
(the remainder of division).

By far the two most interesting operations in the pseudocode are bitstring
concatenation and slicing. Bitstring concatenation is much like appending two
arrays together, or regular string concatenation. Bitstring slicing is a more
nuanced process. Slicing a bitstring by a single index is no different from a
regular array access. If instead a bitstring is sliced by a range of integers,
the result is the concatenation of each single-bit access. Finally, when
integers are sliced instead of bitstring, the pseudocode "treats an integer as
equivalent to a sufficiently long \textelp{} bitstring" [cite:@arm/DDI0553B.s §
E1.3.3].

The final interesting difference between the pseudocode and most imperative
languages is the variety of top-level items. The pseudocode has three forms of
items: procedures, functions and array-like functions. Procedures and functions
behave like procedures and functions of other imperative languages. The
arguments to them are passed by value, and the only difference between the two
is that procedures do not return values whilst functions do
[cite:@arm/DDI0553B.s § E1.4.2].

Array-like functions act as getters and setters for machine state. Every
array-like function has a reader form, and most have a writer form. This
distinction exists because "reading from and writing to an array element require
different functions", [cite:@arm/DDI0553B.s § E1.4.2], likely due to the nature
of some machine registers being read-only instead of read-writeable. The writer
form acts as one of the targets of assignment expressions, along with variables
and the result of bitstring concatenation and slicing [cite:@arm/DDI0553B.s §
E1.3.5].

** Hoare Logic
Hoare logic is a proof system for programs written in imperative programming
languages. At its core, the logic describes how to build partial correctness
triples, which describe how program statements affect assertions about machine
state. The bulk of a Hoare logic derivation is dependent only on the syntax of
the program the proof targets.

A partial correctness triple is a relation between a precondition \(P\), a
program statement \(s\) and a postcondition \(Q\). If \((P , s , Q)\) is a
partial correctness triple, then whenever \(P\) holds for some machine state,
then when executing \(s\), \(Q\) holds for the state after it terminates
[cite:@10.1145/363235.363259].  Those last three words, "after it terminates",
are what leads the relation being a /partial/ correctness triple. If all
statements terminate, which we will see later, then this relation is called a
correctness triple.

Along with the syntactic rules for derivations, Hoare logic typically also
features a number of adaptation rules. The most-widely known of these is the
rule of consequence, which can strengthen the precondition and weaken the
postcondition. This requires an additional logic for assertions. Typically, this
is first-order or higher-order logic
[cite:@10.1007/s00165-019-00501-3;@10.1007/s001650050057].

One vital feature of Hoare logic with regards to specification is auxiliary
variables. These are variables that cannot be used by programs, hence remain
constant between the precondition and postcondition
[cite:@10.1007/s001650050057].

* Implementation
** Formal Definition of MSL
** Semantics of MSL

* Application to Proofs
** General Observations
** Proofs using Hoare Semantics
** Proofs using Denotational Semantics
** Proofs of Barrett Reduction

* Conclusions
** Future Work

#+print_bibliography:

#  LocalWords:  Hoare ISAs Jasmin CompCert structs bitstring bitstrings getters