An Extended Finite State Machine-Based Approach to Code Coverage-Directed Test Generation for Hardware Designs Code Coverage-Directed Test

. Model-based test generation is widely spread in functional verification of hardware designs. The extended finite state machine (EFSM) is known to be a powerful formalism for modelling digital hardware. As opposed to conventional finite state machines, EFSM models separate datapath and control, which makes it possible to represent systems in a more compact way and, in a sense, reduces the risk of state explosion during verification. However, EFSM state graph traversal problem seems to be nontrivial because of guard conditions that enable model transitions. In this paper, a new EFSM-based test generation approach is proposed and compared with the existing solutions. It combines random walk on a state graph and directed search of feasible paths. The first phase allows covering “easy-to-fire” transitions. The second one is aimed at “hard-to-fire” cases; the algorithm tries to build a path that enables a given transition; it is carried out by analyzing control and data dependencies and applying symbolic execution techniques. Experiments show that the suggested approach provides better transition coverage with shorter test sequences comparing to the known methods and achieves a high level of code coverage in terms of statements and branches. Out future plans include some optimizations aimed at method’s applicability to industrial hardware designs.


Introduction
Functional verification is a labor-intensive and time-consuming stage of the hardware design process. According to [1], it spends about 70% of the effort, while the number of verification engineers is usually twice the number of designers. Moreover, the "verification gap", i.e. a difference between verification needs and capabilities, seems to grow over time [2]. In such a situation, improvement of the existing verification methods and development of new ones is of high value and importance. Simulation-based verification, often referred to as testing, is a widely accepted approach to hardware verification. It requires a testbench [1], a special environment that generates inputs, so-called stimuli, vectors or patterns, and optionally observes the outputs, so-called reactions. Among the methods for stimulus generation, model-based approaches are of interest. Being formal representations of designs under test, models serve as a valuable source of "testing knowledge". There are a lot of model types used for specifying hardware: finite state machines (FSM) [3], extended FSM (EFSM) [4], Petri nets [5], etc. The key distinction of the EFSM formalism is clear separation of data and control flows. It is worth mentioning that EFSM models can be automatically extracted from HDL descriptions making it possible to generate code coverage-directed tests [6]. This article advances the FATE approach to EFSM-based functional test generation (FTG) [7]. The main feature of FATE is backjumping: if an EFSM traverser fails to cover a transition, it tries to detect a cause of the failure (that is, a transition which must be traversed in order to enable the target one) and constructs a path directly from the found transition. Another important part of the approach is a special heuristic addressing counters and loops. However, FATE is hardly applicable to hardware designs with complicated data and control dependencies. The rest of the paper is organized as follows. Section II defines the EFSM model and briefly describes an EFSM extraction method having been used. Section III considers the original FATE approach, while Section IV introduces a number of improvements to it. Section V proposes a new EFSM-based FTG method and shows how it works by the example of two simple EFSMs. Section VI contains an experimental comparison of the abovementioned approaches. Section VII concludes the paper and outlines directions for future improvement of the suggested algorithm.

EFSM Model and HDL-to-EFSM Extraction
Let be a set of variables. A valuation is a function that associates each variable with a value from the corresponding domain. The set of all valuations over V is denoted as DV. A guard is a Boolean function defined on valuations (DV  {true, false}). An action is a transformation of valuations (DV  DV). A pair   , where  is a guard and  is an action, is called a guarded action. When we speak about a function, it is implied that there is a description of the function in some formal language (thus, we can reason about the function's syntax, not only the semantics). An EFSM is a tuple M = SM, VM, TM, where SM is a set of states, VM = (IM  OM  RM) is a set of variables, consisting of inputs (IM), outputs (OM) and registers (RM), and TM is a set of transitions (all sets are supposed to be finite). Each transition t  TM is a tuple (st, tt, st), where st and st are respectively the initial and the final state of t, whereas t and t are respectively the guard and the action of t. A valuation   DVM is referred to as a context, while a pair (s, )  SM  DVM is called a configuration. A transition t is said to be enabled for a configuration (s, ) if st = s and t() = true. Given a clock C (a periodic event generator) and an initial configuration (s0, 0), the EFSM operates as follows. In the beginning, it resets (initializes) the configuration: (s, )  (s0, 0). On every "tick" of C, it computes the set of enabled transitions E  {t  TM | st = s  t() = true}. A single transition t  E (chosen nondeterministically) fires; the EFSM changes the configuration (updates the context and moves from the initial state to the final one) (s, )  (st, t()). In this paper, we do not discuss in detail the way the EFSM models are extracted. At the experimental phase, we use an implementation of the method introduced in [8]. The method deals with HDL descriptions written in synthesizable subsets of VHDL and Verilog [9]. The major advantage of the approach is high automationit requires no information except HDL code. The method uses heuristics for identifying states and clock signals and extracts the EFSM from the control flow graph-based representation. For every process defined in the HDL description, a single EFSM is usually built; all EFSM models of the description are defined over the same set of variables. It should be emphasized that EFSM actions have the "flat" syntax, which means that each action is a linear sequence of assignments. We have enhanced the cited method by adding a new heuristic aimed at recognizing the initial configuration. A guarded action r  r is said to be resetting if the following properties hold: (1) r depends on exactly one clock signal, which is called a reset; (2) r consists solely of assignments of the kind v = c, where v  (OM  RM) and c is a constant expression. Provided that there is only one resetting action, that action is supposed to lead to the initial EFSM configuration.

The Original FATE Algorithm
The aim of the FATE algorithm is to generate a test that covers all transitions of a given multi-EFSM system. A test is a set of test sequences, i.e. sequences of test vectors. A test vector is a valuation over the joint set of the EFSMs' inputs. The algorithm includes three phases: an EFSM analysis, a random traversal and a directed traversal.

EFSM Analysis
In the beginning, for each EFSM of the system, data and control dependencies between its transitions are derived. Let t and  be transitions and v be a variable. v is said to be defined in t (v  Deft) if t contains an assignment to v; v is said to be used and there exists a path = { } =1 from t to  (st = st1 and stn = s) that does not define v. To keep the data dependency between τ and t, if v  Def, there should be 's assignment with v in the right hand side that precedes the assignments to v. It is said that  is control dependent on t (via v) if there exists a variable v such that v  (Deft  Use) and there exists a path from t to  that does not define v. The derived data and control dependencies are represented by the directed graphs whose vertices are the transitions and arcs are the dependencies. Thus, each EFSM is associated with two such graphs (one is for the control dependencies; another is for the data dependencies). The second step of the analysis is counter detection. A register is said to be a counter if there is a loop in the EFSM such that: (1) there is a transition t that defines r; (2) r is defined recurrently (the current value depends on the previous one); (3) there is a transition t that is control dependent on t via r. For each counter, all data dependency loops are saved.

Random Traversal
After the analysis, the random traversal phase is launched. The phase is parameterized with two values, L and N, where L is the length of a test sequence and is the number of test sequences in the test. The random traversal is described by the following pseudo-code ({ = 〈 , , 〉} =1 are the EFSMs being tested; result is the generated test): The pseudo-code above is based on the following functions: reset({Mi}) initializes the configurations of the models {Mi}; choose(T) returns a random item of the nonempty set T; refine(, ) replaces variables of the formula  with their values according to the partial valuation ; isSAT() checks whether the constraint  is satisfiable; solve() returns a valuation  such that () = 1; apply(, {Mi}) assigns the inputs of the models {Mi} according to the partial valuation  and executes the enabled transitions (uninitialized inputs are randomized). The symbols si and  denotes respectively the current state of the model Mi and the context (shared among all models). Being defined over the same set of variables, the EFSM models may affect each other while being co-executed. To minimize the influence, the following technique is applied. Each EFSM Mi is supplied with two parameters, Fi and Ai, where Fi is a constant inversely proportional to the number of inputs used in the Mi's guards (the more such inputs Mi has, the more models are expected to be affected by Mi) and Ai is a so-called aging factor (initially set to zero). The sum (Fi + Ai) is supposed to be the priority for choosing the model Mi. The priorities specify the order in which the models are handled (for i  {1, ..., m} do ... end). The main idea with the aging factor is as follows. If test vector generation for Mi fails (isSAT(constraint) returns false for an outgoing transition), Ai is increased by a constant A. Note that [7] has no particular definition of A; we use the value A = mini=1,m Fi. After the model selection loop, the aging factor of the most priority model is set to zero.

Directed Traversal
If there are uncovered transitions after the random traversal, FATE proceeds with the directed generation. Before describing the phase, let us make a remark. The procedure below, applies Dijkstra's algorithm for finding a shortest path in a graph [10]; it is assumed that an arc weight is the number of registers used in the transition's guard. The directed traversal is performed separately for each EFSM.
Here is the pseudo-code (M is the EFSM being tested; result is the generated test): The transition t cannot be reached" end end targets  targets \ {t} end // while targets Besides the auxiliary functions defined above, this pseudo-code uses reach(M, s), which returns the set of known test sequences reaching the state s of the model M, and process(M, t), which tries to cover the transition t of the model M by taking into account the control dependencies (it will be described later on). Note that if targets includes transitions outgoing from the covered states, choose(targets) returns one of them; transitions whose initial states has not been reached are selected only if there are no others. Here is the description of process(M, t): The following notations are used: shortestPath(M, s, s) finds the shortest path between the states s and s of the M's state graph using Dijkstra's algorithm; isCounter(reg) checks whether the register reg is a counter; |v denotes the minimal sub-constraint of the constraint  that depends on the variable v such that   |v holds; [] stands for the constraint produced from  by applying the substitution corresponding to the action .
Let   (x = const1  y = const2) and   {x = z}, where x, y, and z are variables, while const1 and const2 are constants. In this case, |x  (x = const1) and Here is the pseudo-code for processCounter (M, s, t, reg The pseudo-code utilizes three special functions: createLoops(M, s, r) constructs all possible elementary loops in the M's state graph that start from the state s and include transitions dependent via the register r and returns the iterator that combines a bounded number of elementary loops into complex ones (the elementary loops are constructed by using Dijkstra's algorithm to connect dependent transitions); hasNext(i) checks whether the iterator i can produce more loops; next(i) returns the next loop and updates the iterator i. Note that the limit on the loop length is chosen individually for each design.

The FATE+ Algorithm
We have implemented a slightly modified version of the original FATE algorithm, so-called FATE+. Let us consider the changes having been made.

Transition Selection
In FATE+'s random traversal, choose(T), where T is a non-empty set of transitions, works a bit differently. If there exist uncovered transitions, the function randomly chooses one of them; otherwise, it returns an arbitrary item of T. Our experiments show that this minor change significantly increases the effectiveness of the random generation phase.

Symbolic Execution
FATE implements an approximate method for checking whether a given path is feasible (for p  path do ... end). Let P be a path, t be the last transition of P, r be a register used in t, and  be a context. Given a transition p of P, the algorithm checks whether p defines r. If it does, the following constraint is constructed and tried to be satisfied:   p  t|r [p]. It is worth reminding that t|r is the minimal conjunctive member of t that includes all occurrences of r, while t|r[p] is the formula produced from t|r by applying the forward substitution corresponding to the action p. The method looks inadequate in the sense that if  is unsatisfiable for some p, it does not really mean that P is infeasible. We suggest replacing the approximate approach with full-scale symbolic execution that takes into consideration all the variables defined and used along the path. To be more precise, we suggest using the well-known method for computing the weakest precondition of a loop-free program, i.e. a sequence of guarded actions, with respect to a postcondition [11]. The main idea is as follows. Let   true. Starting from the end of P, for each transition p, including t, the following transformation of  is performed:   p   [p]. Note that the input variables are renamed in such a way that each transition refers to a unique copy of the inputs. As soon as P is processed, all occurrences of the registers are replaced by the values taken from :   refine(, ). P is feasible if and only if  is satisfiable. A test sequence can be constructed by solving the constraint. Let us consider an EFSM M with IM = {i0, i1, i2} and RM = {x, y, z} such that there is a path which consists of the following transitions:

Test Reduction
In FATE, there is a frequent situation where multiple test vectors cover the same transition. To overcome the issue, we have introduced a simple test reduction technique. While generating tests, each test sequence is associated with the transitions having been covered. At the end of the process, the set of test sequences W and the set of covered transitions Tcov are available. The technique is as follows. First, the transitions reached by unique test sequences are identified. Each test sequence that covers at least one such transition is moved from W to the reduced test R; all transitions covered by the sequence are excluded from Tcov. Then, while Tcov is not empty, the following actions are performed. The test sequences that cover largest subsets of Tcov are determined; among them, a shortest one is chosen. The selected sequence is moved from W to R, while the covered transitions are removed from Tcov.

The RETGA Algorithm
The algorithm proposed in this paper is called RETGA (Retrascope EFSM-based Test Generation Algorithm). It has the same phases as FATE; moreover, the EFSM analysis phase is identical to FATE's one. As FATE+, it uses the modified choose(T) function and applies the test reduction. Let us consider the main phases in more detail.

Random Traversal
As in FATE, the EFSM models are processed one-by-one; though a different arbitration principle is used. The priority of a model depends on the coverage having been achieved: the better the coverage is, the less the priority is. Such a strategy is to avoid a situation when a covered EFSM of the highest priority prevents generating inputs for poorly covered models. The pseudo-code for the random traversal is as follows (as before, { = 〈 , , 〉} =1 are the EFSMs being tested; result is the generated test):

Directed Traversal
Before describing the directed traversal phase, let us give some definitions. A piecewise path is a sequence of paths, so-called pieces, for which there is a path including all of the pieces (with no overlaps) in the given order. Given a register r, a partial definition path is a piecewise path that propagates at least one input to r and has no transitions not taking part in the propagation. The propagation of an input to a register is inductively defined as follows. If there exist a transition t and a variable r * such that t contains an assignment to r * that involves x, then x is said to be propagated to r * along the piecewise path {{t}}. If (1) x is propagated to r * along the path P, (2) τ is data dependent on t, the last transition of the last piece of P, via r * , and (3) τ contains an assignment to r which involves r * , then x is said to be propagated to r along the path P  {{τ}}. The directed traversal is performed separately for each EFSM. Here is the pseudocode (M is the EFSM being tested; result is the generated test): In the pseudo-code, start(P) and end(P) return respectively the initial and the final state of the piecewise path P; paths(M, s, s) returns the list of cycle-free paths between M's states s and s sorted by length.

Examples
Let us consider how the RETGA algorithm works on the example of two models, namely EFSM-1 and EFSM-2. Both models correspond to the cases that are difficult for FATE.

Experimental Results
The RETGA algorithm has been implemented as a part of the Retrascope [12] project. It uses the Fortress [14] library together with the Z3 [15] solver for representing expressions and solving constraints. To compare the algorithm with FATE and FATE+, the ITC'99 benchmark [13] was utilized. Table I shows the characteristics of the EFSMs extracted from some ITC'99's designs. As it has been already said, we used the extended variant of the method described in [8] to build the models, though all of the presented approaches do not depend on the way EFSMs are produced.   b01  115  70  49  b02  62  48  33  b04  104  104  36  b06  198  100  76  b07  246  208  166  b08  31  31  52  b10 173 170 135 The tests generated by RETGA were applied to the designs by using the Questa simulator [16]. The source code coverage having been achieved is presented in Table IV (each column corresponds to some metric of the Questa coverage report). It can be seen that the code coverage is rather high.

Conclusion
In this paper, an EFSM-based test generation algorithm has been proposed. The approach allows reaching better transition coverage with less number of test vectors than the known methods. However, the research is still in progress; there are many issues to be solved. Let us mention some of them. First, the approach is hardly applicable to complex hardware designs involving a great number of tightly connected EFSMs. It uses a simple coverage-based heuristic to decide which EFSM to handle next, whereas advanced techniques are expected to rely on the semantics of a system under test. Second, the method for searching "bridges" needs to be optimized. Being irrelevant for simple EFSMs (as ones presented in Section VI), this issue is of high value and importance for real-life hardware. Third, in the current implementation, each guard (each constraint, in general) is viewed as an indivisible entity and solved as a whole. It is not an issue as long as the goal is to cover EFSM transitions, but it may lead to poor expression coverage as there are many ways to satisfy a constraint. Finally, the quality of testing strongly depends on the models being used. It seems to be useful to formalize a notion of a "good" model.