Cryptographic Stack Machine Notation One

. A worthy cryptographic protocol specification has to be human-readable (declara-tive and concise), executable and formally verified in a sound model. Keeping in mind these requirements, we present a protocol message definition notation named CMN.1, which is based on an abstraction named cryptographic stack machine. The paper presents the syntax and semantics of CMN.1 and the principles of implementation of the CMN.1-based executable protocol specification language. The core language library (the engine) performs all the message processing, whereas a specification should only provide the declarative definitions of the messages. If an outcoming message must be formed, the engine takes the CMN.1 definition as input and produces the binary data in consistency with it. When an incoming message is received, the engine verifies the binary data with respect to the given CMN.1 definition memorizing all the information needed in the further actions. The verification is com-plete: the engine decrypts the ciphertexts, checks the message authentication codes and signa-tures, etc. Currently, the author's proof-of-concept implementation of the language (embedded in Haskell) can translate a CMN.1-based specifications both to the interoperable implementations and to the programs for the ProVerif protocol analyzer. The excerpts from the CMN.1-based TLS protocol specification and corresponding automatically generated ProVerif program are provided as an illustration.


Introduction
The establishment of good soundness relations between cryptographic protocol implementations and their formal models is a popular research area. The existing approaches differ by the starting point of development (implementation first [1][2][3][4][5][6] or formal model first [7][8][9]), by the degree of cryptographic soundness of the models (symbolic [10] or computational [9]), by the presence of the formal proof of the soundness of the model-to-implementation (or vice verse) translation procedure, by implementation usability area and by other aspects.
Our aim is to soundly tie not two (implementation and formal model) but three elements of the protocol development process: implementation, formal model and specification. By the latter, we mean a human-readable protocol description that is usually placed in RFC. The models' languages, which are based on logics or special versions of general-purpose programming languages, are not quite suitable for this task: they are either not convenient for capturing the low-level details or are firmly imperative. Therefore, our goal is a declarative specification language that could be directly used in the RFCs to considerably enhance the degree of formalization of these documents. Yet, the specification must be automatically translatable both to the interoperable implementation and to the programs for the state-of-the-art protocol model analyzers such as ProVerif [10] and Tamarin [11].

Related work
There exist many formal notations for data structures: ASN.1, JSON, etc. These notations are often provided with the engines, which can automatically generate the binary data using the provided data structure definition and, in the opposite direction, automatically unpack the binary data in accordance with the definition. Such projects as CSN.1 [12], TSN.1 [13], BinPAC [14], NetPDL [15] are targeted specifically at the network protocols. While the readability of some of these notations can be suitable, their expressiveness (in the domain of cryptographic protocols) does not. We need to have behind the notation not simply a message generator/parser waiting to be embedded to some bigger program, but a generic cryptographic protocol implementation waiting for (semi-)declarative specification to adjust to specific case. Therefore, the primary challenge is to find such powerful underlying abstraction, whereas the notation would have to be naturally emerged from it.

Cryptographic Stack Machine Notation One
We propose an abstraction named cryptographic stack machine (abbreviated as CSM), which is a stack machine specifically tailored to the needs of cryptographic protocols. Within the proposed approach, the message definition is in fact a sequence of the CSM instructions. The instructions set is divided into "bare-metal" and "sugared" parts. The "sugared" instructions make the message definitions (which in their essence are imperative) looking declarative. The instructions set may be expanded if needed. To reflect the fact that the declarative style of the protocol message definitions is one of the main targets, we name our notation «Cryptographic Stack Machine Notation One» (abbreviated as CMN.1) adopting the naming style of the ASN.1, CSN.1 and TSN.1 notations.

CMN.1 semantics
CSM has one main stack and varying number of temporary stacks, random-number generator, real-time clock, the storage s_var containing the values of the protocol variables (actually they don't vary in CSM) and the register s_rol containing the identifier of the protocol role ( fig. 1).

Fig. 1. Cryptographic stack machine
The language of the CSM instructions extends the line of the stack-oriented languages. It supports branching but doesn't support looping or recursing (table 1).  SEnc' alg CSM takes the top 3 elements of the stack as arguments: a, b, c. CSM encrypts a with b as initial vector and c as the key using symmetric encryption algorithm alg.
Here and after: 1) if the stack is underflowed, CSM returns an error; 2) the last argument in the argument list is located at the top of the stack; 3) the arguments of the function are removed from the stack; 4) the result is pushed to the stack.

Enco' alg
Encoding of a using algorithm alg. List of arguments: a.
Xor' n Exclusive OR. Arguments: the top n elements of the stack.

ModAdd', ModMult'
Addition (multiplication) of a and b modulo m. List of arguments: a, b, m.
Here and after: the byte strings are interpreted as integers basing on the `big endian` agreement.

ModInv'
Inverse of a under modulo m. List of arguments: a, m.
Add' n Let a is the top element of the stack. CSM adds n to a modulo 2^(8*k), where k is the length of a in bytes.

Rev fun
The function that is reverse to the function fun, where fun must be one of: Mod' Modulo operation. List of arguments: a.

ModExp'
Modular exponentiation: a^b mod m. List of arguments: a,b,m.
Hash' alg CSM calculates the hash of a using algorithm alg. List of arguments: a.

Pad' n ws
Padding of a using the bytes ws until the length of the result reaches n (n must be equal or greater than length of a). List of arguments: a.

Take' ns
Here ns is the list of numbers. If the length of the top element of the stack is less than the sum of the elements of ns, then CSM returns the specification error. Otherwise, CSM cuts the top element of the stack into n parts considering the numbers from the ns list as lengths of elements and pushes (from left to right) the resulting n elements onto the stack, where n is the length of the ns list. The remainder of the top element is dropped (if any).

Split' ns
The same as the instruction Take' ns, except that the length of the top element of the stack must be exactly equal to the sum of the numbers from the ns list.

SplitE' n
Is equivalent to the instruction Split' [k,k...k], where k = len / n, where len is the length of the top element of the stack (len must be dividable by n).
C' n Concatenation. Arguments: the top n elements of the stack.

CE' n
Concatenation of the equal-sized arguments.
Len' e The length of the top element of the stack written in e format, where e can be one of: BE n (packing into n big-endian bytes), LE n (packing into n littleendian bytes), DER (packing using ASN.1 DER format).
Insert i CSM moves the top element of the stack to the i-th position.
Pick i, Dup i CSM moves (for Pick) or copies (for Dup) the i-th element of the stack to the top position.
Free i CSM removes the i-th element from the stack.
Elem i p CSM executes the program p using temporary empty stack and then puts in the current working stack the i-th element of temporary stack.
SA' n k p CSM copies n elements from the current working stack to temporary stack, executes the program p using a new temporary stack and then inserts the resulting elements between the (k+1)-th and k-th elements of the current working stack.
Map' p i n The stack must contain at least i*n elements. CSM executes the program p n times using at each iteration a new temporary stack to which the next i elements from the current working stack are moved (beginning from the depths of the stack). At each iteration the elements containing in temporary stack after execution of p are moved to the current working stack. L n p Macro instruction supplemented by the total length of the resulting elements of p execution (parameter n).  The language presented below is simple in the sense that it doesn't capture the protocol automata in full. A specification consists of the CMN.1-based message definitions and a sequence of protocol actions with simple branching support (table  2). set r vvlist Here vvlist is the list of pairs of type (V name, is). For each pair, the action executes the CSM instruction is and includes the pair (name, val) in a storage s_var belonging to the CSM instance of the role r, where val is concatenation of the resulting elements of the execution of is.

select r is acs
This action provides a branching support in the same manner as the CSM instruction Select is cs does. The difference between the lists cs and acs is that cs consists of elements Case value p, where p is a CSM program, whereas acs consist of elements Case value a, where a is a sequence of protocol actions.

trusted r id p
This action takes from a trusted storage the binary data stored under the name id and processes these data using CMN.1 definition p and the CSM instance of the role r.

connect r port addr
If this action is present, the specification turns into the client implementation acting as the protocol role role. The action carries out the connection to a thirdparty server implementation listening on the port port of the IP-address addr.
accept role port The specification turns into the server implementation acting as role and listening on the port port.

printPV printPV'
Both actions generate the ProVerif program corresponding to the protocol events that took place at the time of the call. The first action generates a full program, the second one ignores the lengths fields of messages and related events as non-essential in order to make this program more concise and productive.
Bearing in mind the elegant and concise syntax of the Haskell language and advantages of embedded domain-specific languages, we integrate our CMN.1-based specification language in Haskell.
As an illustration, we present an excerpt from the CMN.1-based specification of the TLS protocol ( fig. 2; note that the order of declarations can be arbitrary in the Haskell language). A specification, which serves as source for this excerpt, comprises about 500 lines (the total for client and server) covering substantial part of the TLS v.1.2 protocol including four ciphersuites and X.509 certificates support and excluding extensions and renegotiations. The specification turned into the implementation (see the actions connect and accept in the table 2) was successfully tested for interoperability with the OpenSSL v.1.0.2o tool (both in the client and server roles).

Translation to the ProVerif program
The ProVerif program presented in the fig. 3 was generated automatically from the above specification (it is a console output of the call printPV'; see the line 115 in the fig. 2). This program corresponds to the protocol trace based on the ciphersuite TLS-DHE-DSS-WITH-AES-256-CBC-SHA. The program passed the ProVerif compiler checks without warnings. The events and queries of interest have to be inserted manually because CMN.1-based specifications do not contain such information.

Engine implementation details
The engine implements the functionality that is significantly more powerful than the CSM machine presented in the section 3. The engine does not execute the CMN.1notated programs as straightforward as CSM does. It executes the programs symbolically: the elements of the stack are not byte strings but symbolic expressions. This well-known technique allows the engine to fully take over the task of verification of the incoming messages using the same CMN.1-definitions that are used in the direct task of message generation. The verification is complete: the engine decrypts the ciphertexts, checks MACs and signatures, etc. Throughout a protocol execution, the engine accumulates the generated symbolic expressions, their values, lengths and types. It uses this information to generate or verify the protocol messages in the future. In addition, the engine logs such events as calculations of the values of the symbolic expressions and applications of the rewriting rules. This information can be used by the engine's environment to extract symbolic traces and convert them to the programs for symbolic verifiers, e.g. ProVerif (as was presented in the previous section). The scheme of the verification is as follows. Let the byte string bs is considered by the engine as a protocol message with the CMN.1 definition p. Let EQ is a set variable containing equations, i.e. pairs of type (symbolic expression, byte string). The engine implements the verification procedure as follows.
Step 1. The engine executes the program p symbolically resulting the symbolic expression exp. EQ is initialized with the equation (exp,bs).
Step The engine knows about the equality (a b ) c = (a c ) b and analogous equality for the elliptic curve scalar multiplication, so Diffie-Hellman key exchange and ElGamal asymmetric encryption do not ask for special treatment. Yet the engine uses specific rewrites for expressions relevant to the DSA and ECDSA algorithms or to their relatives. The calls exported by the engine are presented below.
1. cSymExec p -The engine executes the program p symbolically and returns the descriptor of the generated symbolic expression. an error, otherwise, it returns the superfluous remainder of the byte string bs (if present). 6. cEvent ev -The engine logs the event ev (i.e. the environment can insert additional events into the engine log). 7. cGetLog -The engine returns content of its log.

Conclusion
We presented cryptographic protocol message notation (named CMN.1) based on the instruction set of a stack machine specifically tailored to the needs of cryptographic protocols (named cryptographic stack machine, or CSM). The principles of implementation of the protocol specification language based on this notation also presented. Within such an approach, specifications are executable and also translatable to the programs for symbolic verifiers, such as ProVerif. The readability of CMN.1-notated specifications is brought in the court of public opinion. In addition, the validation of the proposed notation on a wider spectrum of cryptographic protocols is needed. The validation will certainly cause minor additions to the notation (at least regarding cryptographic key types) without affecting currently defined CSM instructions. Taking into account the fact that the author's proof-of-concept implementation of the core language library (the engine) comprises only 700 lines of the Haskell code (excluding cryptographic primitives), it seems logical to provide in the future a formal description of the engine's algorithm and, basing on it, a proof of the soundness of the ProVerif-translation procedure.