Pieter Koopman: pieter @ cs . ru . nl

Testing is an important and heavily used technique to measure and ensure software quality. It is part of almost any software project. The testing phase of typical projects takes up to 50% of the total project, and hence contributes significantly to the project costs. Any change in the software can potentially influence the result of a test. For this reason tests have to be repeated often. This is error-prone, boring, and time consuming (and hence expensive).

In this talk we present a tool for automatic testing. Automatic testing significantly reduces the effort of individual tests. This implies that performing the same test becomes cheaper, or one can do more tests within the same budget. In this paper we restrict ourselves to functional testing, i.e. examination whether the software obeys the given specification. In this context we distinguish four steps in the functional testing process:

formulation of properties to be obeyed. A property can state for example that: the answer of the tested implementation should be equal to the answer of a reference implementation, A property can state for example that: the answer of the tested implementation should be equal to the answer of a reference implementation, but the language of properties is not restricted to this kind of properties;

generation of test data: in general it is not possible to test the property for all possible input values (that would yield a proof), some sensible fraction of the possible arguments must be choosen;

test execution: the property is evaluated for these values; and

test result analysis: making a verdict based on the results of the test execution.

The introduced test-system, Gast performs the last three steps fully automatically. Gast generates appropriate test-data based on the types used in the specification of the desired properties, it executes the tests for the generated test-values, and gives an evaluation of the test results. The system either produces a message that the software successfully passed the specified number of tests for a particular property, or it shows arguments for which the property does not hold.

Gast is implemented in the functional programming language Clean. The primary goal is to test software written in Clean. However, Gast is not restricted to software written in Clean. It is possible to call a function written in some other language through the foreign function interface, or to invoke another program.

The properties, expressed as functions in Clean, have the power of first order predicate logic. The functions can state properties about individual implemented functions and data-types, as well as larger pieces of software, or even complete programs.

The presented test-system makes testing easier and cheaper. As an important side-effect it encourages the writing of properties that should hold. These properties contribute to the documentation of the system. Moreover, there is empirical evidence that writing specifications on its own contributes to the quality of the system.

To prevent that the tester has to provide code to generate, to compare and to print values of the types used in the properties, Gast uses generic programming. Generic programming deals with universal representations of type instead of concrete types.

Gast deals with properties in first order predicate logic. These properties have to be stated as functions in the functional programming language Clean.

Most existing test systems, like JUnit, only support some of the steps performed by Gast Typically these systems execute a well defined sequence of tests and evaluate their results. Existing test-systems that generate test-data, such as QuickCheck, use random generation of test-data. When the test involves user-defined data-types, the tester has to indicate how elements of that type should be generated. Gast improves both points. Using systematic generation of test-data, duplicated test involving user-defined types does not occur. By using a generic generator the tester does not have to define how elements of a user-defined type have to be generated. However, if the tester explicitly wants to control the generation of data he is able to do so. After these preparations, the test execution is rather straightforward. The property has to be tested for the generated test-data. Gast uses the code generated by the Clean compiler to compute the result of applying a property to test-data. This has two important advantages. First, there cannot exist any semantically differences between the ordinary Clean code and the interpretation of properties. The second advantage of using ordinary compiled code for the evaluation of test-cases is that it keeps Gast simple. In this way we are able to construct a lightweight test-system.

Test result analysis is illustrated by some examples. We introduce some additional tools to improve the test result analysis.

Using Gast it becomes easier and cheaper to test software. As an important side-effect this encourages stating formal properties of the software. Distinguishing properties of our system are that the test-data are generated in a systematic and generic way using generic programming techniques. This implies that there is no need for the user to indicate how data should be generated. Moreover, duplicated tests are avoided and for finite domains Gast is even able to proof a property by testing it for all possible values. Moreover, duplicated tests are avoided and for finite domains Gast is even able to proof a property by testing it for all possible values.

Due to the rich language used to state properties, first order predicate logic, one can define various kinds of properties. This includes properties known under names as black-box tests, algebraic properties, and model based, pre- and post-conditional. Using the ability to specify the test-data manually, also white-box tests are possible.

Based on our experience we indicate four kinds of errors spotted by the system.

Errors in the implementation. This is the kind of mistakes you expect to find.

Errors in the specification. Also in this situation the tested software does not obey the given property. Analysis of the indicated counter example shows that the specification is wrong instead of the software. Testing improves also the confidence in the (executable) specification.

Errors caused by the finite precision of the computer used. Especially for properties involving reals this is a frequent problem. In general we have to state that the difference between the obtained answer and the required solution is smaller than some allowed error range.

Non-termination or run-time errors. Although the system does not explicitly handles these errors, the tester notices that the error occurs. Since Gast lists the arguments before executing the test, the values causing the error are known. This appears to find partially defined function rather effectively.

Currently the tester has to indicate that a property has to be tested by writing an appropriate Start function. In the near future we want to construct a tool that extracts properties from Clean modules and tests these properties fully automatically.

The first step in the testing process is the
formulation of properties in a formalism that can be handled by Gast.
In order to handle properties from firs order predicate logic in Gast
we represent them as functions in Clean. Each property is expressed
by a function yielding a Boolean value. Expressions with value `True
`indicates a successful test, `False
`indicates a counter example. This solves the
famous *oracle problem* : how do we decide whether the result of
a test is correct or not.

The arguments of such a property
represent the universal variables of the logical expression.
Properties can have any number of arguments, each of these

arguments
can be of any non-polymorphically type. These functions can be used
to specify properties of single functions or operations in Clean, as
well as properties of large combinations of functions, or even of
entire programs.

We only consider well-defined and finite
values as test-data. Due to this restriction we are able to use the
and-operator (&&) and or-operator (||) of Clean to represent
the logical operators **and **and **or **respectively.

Our
first example involves the implementation of the logical `or`
-function using only a two-input `nand`-function
as basic building element.

`or
:: Bool Bool -> Bool``or
x y = nand (not x) (not y) ``where
not x = nand x x`

The desired
property is that the value of this function is always equal to the
value of the ordinary or-operator, ||}, from Clean.

That is, the
||-operator serves as specification for the new implementation, `or`.
This property can be represented by the following function

in
Clean. By convention we will prefix the name properties by `prop`
.

`propOr
:: Bool Bool -> Bool``propOr
x y = x||y == or x y`

The
user invokes the testing of this property by the main
function:`Start =
test propOr`

Gast yields `Proof:
success for all arguments after 4` tests for this
property. Since there are only finite types involved the property can
be proven by testing.

For our second
example we consider the classical implementation of stacks:`::
Stack a :== [a]``pop
:: (Stack a) -> Stack a``pop
[_:r] = r``top
:: (Stack a) -> a``top
[a:_] = a``push
:: a (Stack a) -> Stack a``push
a s = [a:s]`

A desirable property for stacks is that after pushing some element onto the stack, that element is on top of the stack. Popping an element just pushed on the stack yields the original stack. The combination of these properties is expressed as:

`propStack
:: a (Stack a) -> Bool | gEq{|*|} a``propStack
e s = top (push e s) === e && pop (push e s) === s`

This property should hold for any type of
stack-element. Hence we used polymorphic functions and the
* generic *equality,

`propStackInt
:: (Int (Stack Int) -> Bool)``propStackInt
= propStack`

In contrast to properties that use overloaded
types, it actually does not matter much which concrete type we
choose. A polymorphic property will hold for elements of any type if
it holds for elements `Int`.
The test is executed by `Start
= test propStackInt`. Gast yields: `Passed
after 1000 tests`. This property involves the very
large type integer and the infinite type stack, only testing for a
finite number of cases, here 1000, is possible.

In `propOr`
we used a reference implementation to state a property about
the function `or`.
In `propStack`
expressed the desired property directly as a relation between
functions on a datatype. Other kind of properties state relations
between the input and output of functions, or use model checking
based properties. For instance, we have tested a system for safe
communication over unreliable channels by an alternating bit protocol
with the

requirement that
sequence of received messages should be equal to the input sequence
of messages.`::
Bit = O | I``instance
~ Bit``where``
~ O = I``
~ I = O``::
Message c = M c Bit | MError``::
Ack =
A Bit | AError``::
SenderState c = Send Bit | Exp Bit c``::
ReceiverState = Rec Bit``sender
:: (SenderState c) [c] [Ack] -> [Message c]``sender
(Send b) [] as = []``sender
(Send b) [c:cs] as = [M c b: sender (Exp b c) cs as]``sender
state=:(Exp b c) cs [a:as]`` =
case a of``
A b` | b===b` = sender (Send (~b)) cs as``
_
= [M c b: sender state cs as]``sender
(Exp b c) cs [] = abort "Sender: Acknowledgement
expected"``receiver
:: ReceiverState [Message c] -> ([Ack],[c])``receiver
rState [] = ([],[])``receiver
rState=:(Rec b) [m:ms]`` =
case m of``
M c b` | b===b` = ([A b :as],[c:cs]) where (as,cs)
= receiver (Rec (~b)) ms``
_
= ([A (~b):as], cs ) where (as,cs) = receiver rState
ms``sChannel
:: (Int->Bool) [Message c] -> [Message c]``sChannel
error ms = [ if (error n) MError m \\ m <- ms & n <-
[1..]]``rChannel
:: (Int->Bool) [Ack] -> [Ack]``rChannel
error as = [ if (error n) AError a \\ a <- as & n <-
[1..]]``abpSystem
:: (Int->Bool) (Int->Bool) [c] -> [c]``abpSystem
sError rError toSend``
= received``where``
msSend = sender (Send O) toSend (rChannel rError acks)``
(acks,received) = receiver (Rec O) (sChannel sError
msSend)``propAltBit
:: (Int->Bool) (Int->Bool) Int -> Property``propAltBit
sError rError n = n>=0 ==> label n @ toSend == received``where
toSend = [1..n]``
received = abpSystem sError rError toSend`

One often adds the implication operator,` ``=>`
, as a primitive to the predicate logic. We can use the law `p
=> q = not p or q` to implement it:

`(===>)
infix 1 :: Bool Bool -> Bool``(===>)
p q = not p || q `

In first order predicate logic one also has the existential quantifier. If this is used to introduce values in a constructive way it can be directly transformed to local definitions in a functional programming language, for instance as:

`propSqrt
:: Real -> Bool``propSqrt
x = x >=0 ===> let y = sqrt x in y*y == x `

In general there is no constructive way to
construct an extensionally quantified value. In general it is not
possible to construct an extensionally quantified value. For
instance, for a type `Day
`and a function `tomorrow
`we require that for each day there exists a
tomorrow. The day `d`
has to be searched in the type `Day.
`Gast has the operator `Exists`
to express this:

`propTomorrow
:: Day -> Property``propTomorrow
day = Exists \d = tomorrow day === d`

The only task of the tester is to write
properties, like `propOr`,
and to invoke the testing by `Start
= test propOr` . Based on the type of arguments
needed by the property, the test system will generate test-data,
execute the test for these values, and analyze the results of the
tests.

Papers about Gast (and the other research of our group) can be found here. As an introduction you can start with Fully Automatic Testing with Functions as Specifications.

A recent version of the code is included in the regular Clean distribution this can be obtained at https://wiki.clean.cs.ru.nl/Download_Clean .

Do not hesitate to mail me if you want additional information or can provide feedback.