Gast: A Generic Automatic Software Test-system

Pieter Koopman: pieter @ cs . ru . nl


Testing is an important and heavily used technique to measure and ensure software quality. It is part of almost any software project. The testing phase of typical projects takes up to 50% of the total project, and hence contributes significantly to the project costs. Any change in the software can potentially influence the result of a test. For this reason tests have to be repeated often. This is error-prone, boring, and time consuming (and hence expensive).

In this talk we present a tool for automatic testing. Automatic testing significantly reduces the effort of individual tests. This implies that performing the same test becomes cheaper, or one can do more tests within the same budget. In this paper we restrict ourselves to functional testing, i.e. examination whether the software obeys the given specification. In this context we distinguish four steps in the functional testing process:

  1. formulation of properties to be obeyed. A property can state for example that: the answer of the tested implementation should be equal to the answer of a reference implementation, A property can state for example that: the answer of the tested implementation should be equal to the answer of a reference implementation, but the language of properties is not restricted to this kind of properties;

  2. generation of test data: in general it is not possible to test the property for all possible input values (that would yield a proof), some sensible fraction of the possible arguments must be choosen;

  3. test execution: the property is evaluated for these values; and

  4. test result analysis: making a verdict based on the results of the test execution.

The introduced test-system, Gast performs the last three steps fully automatically. Gast generates appropriate test-data based on the types used in the specification of the desired properties, it executes the tests for the generated test-values, and gives an evaluation of the test results. The system either produces a message that the software successfully passed the specified number of tests for a particular property, or it shows arguments for which the property does not hold.

Gast is implemented in the functional programming language Clean. The primary goal is to test software written in Clean. However, Gast is not restricted to software written in Clean. It is possible to call a function written in some other language through the foreign function interface, or to invoke another program.

The properties, expressed as functions in Clean, have the power of first order predicate logic. The functions can state properties about individual implemented functions and data-types, as well as larger pieces of software, or even complete programs.

The presented test-system makes testing easier and cheaper. As an important side-effect it encourages the writing of properties that should hold. These properties contribute to the documentation of the system. Moreover, there is empirical evidence that writing specifications on its own contributes to the quality of the system.

To prevent that the tester has to provide code to generate, to compare and to print values of the types used in the properties, Gast uses generic programming. Generic programming deals with universal representations of type instead of concrete types.

Gast deals with properties in first order predicate logic. These properties have to be stated as functions in the functional programming language Clean.

Most existing test systems, like JUnit, only support some of the steps performed by Gast Typically these systems execute a well defined sequence of tests and evaluate their results. Existing test-systems that generate test-data, such as QuickCheck, use random generation of test-data. When the test involves user-defined data-types, the tester has to indicate how elements of that type should be generated. Gast improves both points. Using systematic generation of test-data, duplicated test involving user-defined types does not occur. By using a generic generator the tester does not have to define how elements of a user-defined type have to be generated. However, if the tester explicitly wants to control the generation of data he is able to do so. After these preparations, the test execution is rather straightforward. The property has to be tested for the generated test-data. Gast uses the code generated by the Clean compiler to compute the result of applying a property to test-data. This has two important advantages. First, there cannot exist any semantically differences between the ordinary Clean code and the interpretation of properties. The second advantage of using ordinary compiled code for the evaluation of test-cases is that it keeps Gast simple. In this way we are able to construct a lightweight test-system.

Test result analysis is illustrated by some examples. We introduce some additional tools to improve the test result analysis.

Using Gast it becomes easier and cheaper to test software. As an important side-effect this encourages stating formal properties of the software. Distinguishing properties of our system are that the test-data are generated in a systematic and generic way using generic programming techniques. This implies that there is no need for the user to indicate how data should be generated. Moreover, duplicated tests are avoided and for finite domains Gast is even able to proof a property by testing it for all possible values. Moreover, duplicated tests are avoided and for finite domains Gast is even able to proof a property by testing it for all possible values.

Due to the rich language used to state properties, first order predicate logic, one can define various kinds of properties. This includes properties known under names as black-box tests, algebraic properties, and model based, pre- and post-conditional. Using the ability to specify the test-data manually, also white-box tests are possible.

Based on our experience we indicate four kinds of errors spotted by the system.

  1. Errors in the implementation. This is the kind of mistakes you expect to find.

  2. Errors in the specification. Also in this situation the tested software does not obey the given property. Analysis of the indicated counter example shows that the specification is wrong instead of the software. Testing improves also the confidence in the (executable) specification.

  3. Errors caused by the finite precision of the computer used. Especially for properties involving reals this is a frequent problem. In general we have to state that the difference between the obtained answer and the required solution is smaller than some allowed error range.

  4. Non-termination or run-time errors. Although the system does not explicitly handles these errors, the tester notices that the error occurs. Since Gast lists the arguments before executing the test, the values causing the error are known. This appears to find partially defined function rather effectively.

Currently the tester has to indicate that a property has to be tested by writing an appropriate Start function. In the near future we want to construct a tool that extracts properties from Clean modules and tests these properties fully automatically.

Examples of properties to be checked

The first step in the testing process is the formulation of properties in a formalism that can be handled by Gast. In order to handle properties from firs order predicate logic in Gast we represent them as functions in Clean. Each property is expressed by a function yielding a Boolean value. Expressions with value True indicates a successful test, False indicates a counter example. This solves the famous oracle problem : how do we decide whether the result of a test is correct or not.

The arguments of such a property represent the universal variables of the logical expression. Properties can have any number of arguments, each of these
arguments can be of any non-polymorphically type. These functions can be used to specify properties of single functions or operations in Clean, as well as properties of large combinations of functions, or even of entire programs.

We only consider well-defined and finite values as test-data. Due to this restriction we are able to use the and-operator (&&) and or-operator (||) of Clean to represent the logical operators and and or respectively.

Our first example involves the implementation of the logical or -function using only a two-input nand-function as basic building element.

or :: Bool Bool -> Bool
or x y = nand (not x) (not y)
where not x = nand x x

The desired property is that the value of this function is always equal to the value of the ordinary or-operator, ||}, from Clean.
That is, the ||-operator serves as specification for the new implementation, or. This property can be represented by the following function
in Clean. By convention we will prefix the name properties by prop .

propOr :: Bool Bool -> Bool
propOr x y = x||y == or x y

The user invokes the testing of this property by the main function:

Start = test propOr

Gast yields Proof: success for all arguments after 4 tests for this property. Since there are only finite types involved the property can be proven by testing.

For our second example we consider the classical implementation of stacks:

:: Stack a :== [a]

pop :: (Stack a) -> Stack a
pop [_:r] = r

top :: (Stack a) -> a
top [a:_] = a

push :: a (Stack a) -> Stack a
push a s = [a:s]

A desirable property for stacks is that after pushing some element onto the stack, that element is on top of the stack. Popping an element just pushed on the stack yields the original stack. The combination of these properties is expressed as:

propStack :: a (Stack a) -> Bool | gEq{|*|} a
propStack e s = top (push e s) === e && pop (push e s) === s

This property should hold for any type of stack-element. Hence we used polymorphic  functions and the generic equality, ===, here. However, Gast can only generate test-data for some concrete type. Hence, we have to specify which type Gast should use for the type argument a. For instance by:

propStackInt :: (Int (Stack Int) -> Bool)
propStackInt = propStack

In contrast to properties that use overloaded types, it actually does not matter much which concrete type we choose. A polymorphic property will hold for elements of any type if it holds for elements Int. The test is executed by Start = test propStackInt. Gast yields: Passed after 1000 tests. This property involves the very large type integer and the infinite type stack, only testing for a finite number of cases, here 1000, is possible.


In propOr we used a reference implementation  to state a property about the function or. In propStack expressed the desired property directly as a relation between functions on a datatype. Other kind of properties state relations between the input and output of functions, or use model checking based properties. For instance, we have tested a system for safe communication over unreliable channels by an alternating bit protocol with the

requirement that sequence of received messages should be equal to the input sequence of messages.

:: Bit    = O | I

instance ~ Bit
    ~ O = I
    ~ I = O

:: Message c        = M c Bit | MError
:: Ack              = A Bit | AError
:: SenderState c    = Send Bit | Exp Bit c
:: ReceiverState    = Rec Bit

sender :: (SenderState c) [c] [Ack] -> [Message c]
sender (Send b) []     as = []
sender (Send b) [c:cs] as = [M c b: sender (Exp b c) cs as]
sender state=:(Exp b c) cs [a:as]
 = case a of
    A b` | b===b`    = sender (Send (~b)) cs as
    _                = [M c b: sender state cs as]
sender (Exp b c) cs [] = abort "Sender: Acknowledgement expected"

receiver :: ReceiverState [Message c] -> ([Ack],[c])
receiver rState [] = ([],[])
receiver rState=:(Rec b) [m:ms]
 = case m of
    M c b` | b===b`  = ([A   b :as],[c:cs]) where (as,cs) = receiver (Rec (~b)) ms
    _                = ([A (~b):as],   cs ) where (as,cs) = receiver rState ms

sChannel :: (Int->Bool) [Message c] -> [Message c]
sChannel error ms = [ if (error n) MError m \\ m <- ms & n <- [1..]]

rChannel :: (Int->Bool) [Ack] -> [Ack]
rChannel error as = [ if (error n) AError a \\ a <- as & n <- [1..]]

abpSystem :: (Int->Bool) (Int->Bool) [c] -> [c]
abpSystem sError rError toSend
    = received
    msSend = sender (Send O) toSend (rChannel rError acks)
    (acks,received) = receiver (Rec O) (sChannel sError msSend)

propAltBit :: (Int->Bool) (Int->Bool) Int -> Property
propAltBit sError rError n = n>=0 ==> label n @ toSend == received
where   toSend = [1..n]
        received = abpSystem sError rError toSend

One often adds the implication operator, => , as a primitive to the predicate logic. We can use the law p => q = not p or q to implement it:

(===>) infix 1 :: Bool Bool -> Bool
(===>) p q = not p || q

In first order predicate logic one also has the existential quantifier. If this is used to introduce values in a constructive way it can be directly transformed to local definitions in a functional programming language, for instance as:

propSqrt :: Real -> Bool
propSqrt x = x >=0 ===> let y = sqrt x in y*y == x

In general there is no constructive way to construct an extensionally quantified value. In general it is not possible to construct an extensionally quantified value. For instance, for a type Day and a function tomorrow we require that for each day there exists a tomorrow. The day d has to be searched in the type Day. Gast has the operator Exists to express this:

propTomorrow :: Day -> Property
propTomorrow day = Exists \d = tomorrow day === d

The only task of the tester is to write properties, like propOr, and to invoke the testing by Start = test propOr . Based on the type of arguments needed by the property, the test system will generate test-data, execute the test for these values, and analyze the results of the tests.


Papers about Gast (and the other research of our group) can be found here. As an introduction you can start with Fully Automatic Testing with Functions as Specifications.

A recent version of the code is included in the regular Clean distribution this can be obtained at .

Do not hesitate to mail me if you want additional information or can provide feedback.