On the Implementation of a Formal Method for Verification of Scalable Cache Coherent Systems

. This article analyzes existing methods of verification of cache coherence protocols of scalable systems. Analyzed methods include model checking, deductive verification, methods that extend these two methods: compositional verification methods and abstraction-based methods. Based on the research literature, the paper describes a method of formal parameterized verification of safety properties of cache coherence protocols. The method is based on syntactical transformations of Promela models. First, a mathematical model (transition system) of cache coherence protocols is described. Second, the corresponding abstract model is presented according with the concrete model transformations. These transformations lead to abstract model that is independent of the number of processors in the system under verification. The paper proposes a design of a verification system for cache coherence protocols. The main part of the design is a Promela translator and abstract transformations subsystem that obtains an internal representation of a Promela model and modifies it according to the transformations. The article analyzes the method in terms of development and examination of the corresponding Promela model of the German cache coherence protocol. Examples of the syntactic transformations are shown. In order to demonstrate the method’s ability to find bugs, verification results of two buggy versions of the German protocol obtained from the literature are presented and analyzed. Drawbacks of the method are presented. In particular, the usage of a limited Promela subset leads to unnecessary complications and unnatural models. The paper discusses extension and automation of the method needed to adapt it to verification challenges of the Elbrus microprocessors.


Introduction
Modern microprocessor systems are scalablethe number of cores per chip increases and chips are combined into clusters. Each processor of the system has access to the shared address space. However, memory is physically distributed among the processors in order to increase the bandwidth and reduce the latency to local memory. Thus, access to the local memory is faster than access to the remote memory. To decrease the memory bandwidth demands of a processor, processors are equipped with multilevel caches. Caching of shared data introduces the problem of cache coherence. To solve the problem, computer architects often use hardware mechanisms that implement cache coherence protocols. Concurrent work of many hardware devices (for example, cache and main memory controllers), which exchange information in accordance with a cache coherence protocol, results in a colossal size of the protocol's state space. This, in turn, makes verification of cache coherence protocols an extremely hard task. To work out the problem, scientists have been conducting research in the direction of formal methods for the past few decades and achieved a level of success. However, scalable verification is still an issue. Scalability leads to the need for formal verification methods that are capable of adapting to it. As the size of systems increases, the fully automated method of model checking reaches its limits and can no longer be used due to the state space explosion problem. As a rule, existing formal approaches to verification are either inapplicable to industrial-strength microprocessor systems or require an enormous amount of manual work.

Primary Verification Methods
Formal methods provide a mathematical proof of the correspondence between a model of the object under verification and the object's specification, that is, a set of properties it is supposed to satisfy. A mathematical model of reactive systemsand cache coherence protocols are examples of reactive systemsthat allows to systematically represent systems components, their coordination and interaction, is a transition system [1]. The main approaches to formal verification are model checking and deductive verification. The method of model checking [2] systematically explores the finite state space of the protocol under verification by means of specific algorithms. The advantages of model checking are full automation and generation of counterexamples that help us find the sources of bugs. The main disadvantage is the state space explosion problem. Modern cache coherence protocols have too many states for an effective state space inspection to be feasible. Let us consider verification of safety properties, which are described by linear temporal logic (LTL) formula Gp, where p is an assertiona formula constructed 185 by applying logical connectives to variables of the model. If the assertion is true in each state of the model, then p is an invariant of the model. According to the method of deductive verification, in order to prove Gp, it is necessary to develop an auxiliary assertion  , which is an over-approximation of the state space, and then show that  implies p (i.e., that  is stronger than p). The method is based on the following inference rule [1]: An assertion  is called inductive if it satisfies the premises I1 and I2. An inductive assertion is always an over-approximation of the set of reachable states. If p is an invariant of the system under verification, then there always exists an inductive assertion  stronger than p [1]. The initial assertion p is rarely inductive. As a rule, the verification engineer must develop an auxiliary assertion and check the validity of the premises I1-I3. Deductive verification allows us to work with systems with infinite number of states. Theorem provers assist in using formal logic for reasoning about mathematical objects. Popular tools are ACL2, PVS, Isabelle. The underlying logics of theorem provers vary substantially. However, all theorem provers support rich and expressive logics. In general, expressiveness of a logic leads to its undecidability. That means that there is no automatic procedure that, given a formula, can always determine if there exists a derivation of the formula in the logic. The use of theorem proving presumes interaction with an expert user and is a complicated creative process. When the theorem prover cannot find the derivation of a formula given a proof outline, it is very hard to find the actual bug in the system under verification. Reference [3] describes the experience of using the PVS theorem prover for parameterized verification of the FLASH cache coherence protocol. During the proof construction, authors manually looked for candidates for inductive assertions many times. When they failed to prove their inductiveness, they analyzed the reasons for that and devised additional conditions that transformed the assertion into an inductive one. This process is extremely laborious, which is why methods that are solely based on theorem proving can only find a limited usage in verification of cache coherence protocols.

Verification Methods for Scalable Systems
Development of verification methods for scalable systems may be carried on in several directions: 1) improvement of methods based on model checking; 2) 186 improvement of methods based on deductive verification; 3) combination of the methods from the first and the second groups. Methods of verification of cache coherence protocols deployed in industrial-strength microprocessor systems must satisfy a number of requirements: 1) possibility of conducting verification in a reasonable amount of time; 2) high level of automation; 3) ability to provide information about sources of bugs. Model checking or deductive verification on their own do not meet these needs. Consequently, building a general infrastructure that would combine and further develop methods of model checking and deductive verification seems to be the most promising approach to verification of scalable systems.

Abstraction and Compositional Model Checking
The main approaches allowing the application of model checking to verification of scalable systems are abstract model checking and compositional verification [2]. Abstraction methods diminish the number of states of the model under verification and preserve the properties of interest at the same time. Equivalence relations, which guarantee that the models will have the same behaviors, usually do not decrease the number of states sufficiently. Instead, simulation relations, which relate models to their abstractions, are used. The simulation guarantees that every behavior of a model is a behavior of its abstraction. However, the abstraction might have behaviors that are not possible in the original system. Abstract state spaces may be obtained by means of under-approximation methods, which remove behaviors, or over-approximation methods, which add new behaviors. Thus, in case of under-approximation, a bug in the abstract model implies a bug in the concrete model, and in case of over-approximation, correctness of the abstract model implies correctness of the concrete model. Further in this article we only consider over-approximations, also known as conservative abstractions. Developing abstract models involves finding a compromise between two conflicting goals: 1) generation of small abstract models that can be model checked; 2) generation of precise abstract models. Usually, the smaller the model, the more behaviors it allows. This may lead to spurious counterexamples that are not present in the concrete model. There are at least two ways out: 1) construction of precise abstract models; 2) analysis of counterexamples and modification of the abstract model according to the acquired information (counterexample-guided abstraction refinement).
Methods that create precise abstract models (for example, based on counter abstraction or environment abstraction [4]) lead to models of big size in case of complicated protocols. The idea of compositional verification [5] is to exploit the natural decomposition of a distributed system into processes. Processes are verified individually (with a 187 generalized environment), then the results are combined, and a verdict about correctness of the initial model is made. A compositional approach must provably lead to simplified models satisfying the properties of the initial model.

General Idea
The method described in this paper adapts the method [6] to work with a subset of Promela. The method is based on a combination of model checking and theorem proving. The choice of Spin is motivated by the fact that Spin is a modern and constantly evolving tool that supports many optimizations and verification modes. The Promela language is convenient for description of distributed systems, including cache coherence protocols. Moreover, Spin may be used as the basis for generators of test programs the purpose of which is verification of implementations of cache coherence protocols [7]. The method shows how to build an abstract model that simulates a given concrete model of a cache coherence protocol. The construction is performed by means of syntactic transformations of the concrete Promela model.

A Mathematical Model of Cache Coherence Protocols
Cache coherence protocols may be seen as asynchronous systems of communicating processes in which a process is a finite automaton. Then a mathematical model of a cache coherence protocol is a system of communicating finite automata. A Promela model specifies the behavior of a set of asynchronously executing processes in a distributed system. Each Promela process defines an extended finite automaton. Thus, Promela is suitable for describing models of cache coherence protocols. By simulating the execution of a Promela model we can build a digraph of all reachable states of the model. Each node in the graph represents a state of the model, and each edge represents a single possible execution step by one of the processes. This graph is always finite [8]. Safety properties can be interpreted as statements about the presence or absence of specific types of nodes in the reachability graph. Let us consider the transition system corresponding to the reachability graph. The following discussion considers a subset of Promela.
In order to be able to formally define syntactic transformations of a Promela model, we will represent models by means of a triple ) , ,  V is a set of variables of the model, each variable is of its own type;   is the initialization predicate;  R is the set of transition rules represented as guarded commands consisting of a condition and a set of assignments: } : where cond is the condition (predicate),

Justification of the Abstraction Rules
It can be shown [9] that the abstraction map abs S S  : Then, safety properties are preserved: If a state is reachable in the concrete model, it is reachable in the abstract model. In other words, the abstraction map is a simulation relation.

The Method
The verification method is based on two observations. The first one is the fact that the abstraction map is a simulation relation. The second one is the guard strengthening principle [9] that makes the following strategy correct. Given a model P and a predicate  , in order to prove that   | P : 1) add  to the conditions of transition rules of P by means of conjunction; 2) prove that  is an invariant of the newly acquired model.
Otherwise, examine a counterexample provided by Spin, devise an invariant  and modify Q as described in [9]. Set      . Go to step 2.

Design of a Cache Coherence Protocols Verification System
The syntactic transformations described in section 5.3 can be fully automated.
Performing them by hand is tedious and impractical, especially in an industrial setting. Therefore, in order to alleviate this problem, a tool may be developed, which would build an internal representation of the concrete Promela model, modify it according to the transformations, and produce the abstract model. An abstract syntax tree may be the internal representation. The transformations of Promela models are shown in Fig. 1.
The question of automating the refinement transformations is significantly harder. Further research is needed in this direction.

Verification of the German Cache Coherence Protocol
I developed a Promela model of the German protocol. The model is written in the style of [10]. The model implements the algorithm of memory access requests processing shown in Fig. 2.
Processor core Home processor Cache with a shared copy Cache with a shared copy

192
A processor core and the corresponding cache controller are represented by the Promela process core and the home-processor is represented by the process home. Thus, the model consists of one process home and N processes core where N is a natural number. Interaction between the processes is accomplished by means of the three Promela arrays channel1, channel2, and channel3 (see Fig. 3). The array channel1 is for the initial requests req_* sent by a processor to the home processor. The array channel2 is for the snoop requests invalidate sent by the home processor to cache controllers and for grants grant_*. The array channel3 is used for coherence answers sent by cache controllers to the home processor (invalidate_ack). The German protocol uses three main states of a cache line: Invalid, Exclusive, and Shared. According to the transformations described in section 5.3, I developed the initial version of the abstract model. The abstract model contains one process home, two processes core, and one abstract process home_abs. One of the most complicated parts of creating the abstract modelthe transformation of assignmentsis depicted in Table 2. Table 2   == shared)) ) od } This property did not hold on the initial abstract model. According to section 5.5, I performed the refinement process. Two additional invariants were developed and the verification process was finished due to the absence of counterexamples. The refinement process was similar to that described in [6].
For the experimental check of the method's ability to find bugs, I verified two buggy versions of German described in [4]. In the first buggy version, after the home processor grants exclusive access to a cache, it fails to set the exclusive_granted variable to true. Thus, when another cache requests shared access, it gets the access even though the first cache holds it in exclusive state. In this case Spin issues a counterexample because the assertion is violated.
In the second buggy version, the home processor grants a shared request even if exclusive_granted variable is true. In this case Spin issued a counterexample because of the violation of one of the invariants found during the abstraction process.

Conclusion and Directions for Future Work
Formal methods for verification of cache coherence protocols fall into two groups: methods based on model checking and methods based on deductive verification. Model checking is fully automated but suffers from the state space explosion problem. Deductive verification is scalable but requires a lot of expert's hand work. Combination of the two approaches seems promising because of its potential ability to lead to a scalable method that requires an acceptable amount of hand work. On the basis of existing literature, a method that is such a combination is described.