Wanda experiments for ESOP 2019

These experiments were done for the paper A static higher-order dependency pair framework, published in the proceedings for ESOP 2019. All experiments were performed with Wanda version 2.1, on a 60 second timeout.

We performed two series of experiments. In the first, we compared the static DP framework to the alternatives implemented in Wanda: no dependency pairs (so only rule removal using reduction orderings) and dynamic dependency pairs following [KopRaa12]. We compared both the power if only techniques discussed in the paper are used, if an additional delegation to a first-order prover is used, and if the methods are combined. This gives the following results:

Technique	#Yes	#No	Avg. time
Rule removal using reduction pairs and basic non-termination checking only (no DPs)	92	9	0.32
Static DPs with only techniques from the paper	124	0	0.07
Dynamic DPs with only techniques from the paper	132	0	0.52
Static DPs with basic non-termination and delegation to a first-order prover	129	16	0.58
Dynamic DPs with basic non-termination and delegation to a first-order prover	137	16	0.93
Static and dynamic DPs combined, with basic non-termination and delegation to a first-order prover	150	16	0.70

(full table and experimental setup for the first experiment)

Note that the non-termination processor has not yet been implemented in Wanda; non-termination verification is done exclusively through basic checks outside the DP frameworks, and by verifying the NO results of first-order delegation within the DP framework (see also our paper at FroCoS'11). This is why there are no NO results when only the techniques in the paper are enabled.

The second set of experiments aimed to assess the power of individual techniques within the static DP framework. To this end, we have run Wanda with only the techniques from the current paper five more times, each with one technique disabled. This gives the following result:

Technique	#Yes	Avg. time
No formative rules	124	0.07
No usable rules	124	0.10
No subterm criterion or static subterm criterion	121	0.16
No dependeny graph	121	0.35
No reduction triples	92	0.01

(full table and experimental setup for the second experiment)

Note that, with the exception of reduction triples, none of the techniques seem to give much power when compared to the whole: it appears that for most benchmarks, several different techniques are feasible. This is emphasised by the following table, where we tested the power of Wanda when the use of a first-order tool was enabled. In this setting, we may even surrender the use of reduction triples without losing much power:

Technique	#Yes	Avg. time
No formative rules (but first-order tool)	129	0.57
No usable rules (but first-order tool)	129	0.56
No subterm criterion or static subterm criterion (but first-order tool)	126	0.64
No dependeny graph (but first-order tool)	126	0.61
No reduction triples (but first-order tool)	123	0.55
Static DPs with only techniques from the paper (so no first-order tool)	124	0.07
Static DPs with techniques from the paper and first-order tool	129	0.54

(full table and experimental setup for the third experiment)

To further assess the strength of the individual methods, we have also run Wanda with only one or two techniques: either the subterm criterion or reduction triples, each coupled with nothing, the dependency graph, usable rules or formative rules. This provides the following results:

Technique	#Yes	Avg. time
Only the (static) subterm criterion	29	0.00
The (static) subterm criterion and dependency graph	92	0.01
The (static) subterm criterion and formative rules	29	0.00
The (static) subterm criterion and usable rules	29	0.00
Only reduction triples	105	0.98
Reduction triples and the dependency graph	107	0.62
Reduction triples and formative rules	108	0.86
Reduction triples and usable rules	120	0.18

(full table and experimental setup for the fourth experiment)

This reinforces the point that all techniques seem to add something, although the boost depends on what other techniques are available, and may be small.