Wanda experiments for ESOP 2019

These experiments were done for the paper A static higher-order dependency pair framework, published in the proceedings for ESOP 2019. All experiments were performed with Wanda version 2.1, on a 60 second timeout.

We performed two series of experiments. In the first, we compared the static DP framework to the alternatives implemented in Wanda: no dependency pairs (so only rule removal using reduction orderings) and dynamic dependency pairs following [KopRaa12]. We compared both the power if only techniques discussed in the paper are used, if an additional delegation to a first-order prover is used, and if the methods are combined. This gives the following results:

Technique #Yes #No Avg. time
Rule removal using reduction pairs and basic non-termination checking only (no DPs) 92 9 0.32
Static DPs with only techniques from the paper 124 0 0.07
Dynamic DPs with only techniques from the paper 132 0 0.52
Static DPs with basic non-termination and delegation to a first-order prover 129 16 0.58
Dynamic DPs with basic non-termination and delegation to a first-order prover 137 16 0.93
Static and dynamic DPs combined, with basic non-termination and delegation to a first-order prover 150 16 0.70

(full table and experimental setup for the first experiment)

Note that the non-termination processor has not yet been implemented in Wanda; non-termination verification is done exclusively through basic checks outside the DP frameworks, and by verifying the NO results of first-order delegation within the DP framework (see also our paper at FroCoS'11). This is why there are no NO results when only the techniques in the paper are enabled.

The second set of experiments aimed to assess the power of individual techniques within the static DP framework. To this end, we have run Wanda with only the techniques from the current paper five more times, each with one technique disabled. This gives the following result:

Technique #Yes Avg. time
No formative rules 124 0.07
No usable rules 124 0.10
No subterm criterion or static subterm criterion 121 0.16
No dependeny graph 121 0.35
No reduction triples 92 0.01

(full table and experimental setup for the second experiment)

Note that, with the exception of reduction triples, none of the techniques seem to give much power when compared to the whole: it appears that for most benchmarks, several different techniques are feasible. This is emphasised by the following table, where we tested the power of Wanda when the use of a first-order tool was enabled. In this setting, we may even surrender the use of reduction triples without losing much power:

Technique #Yes Avg. time
No formative rules (but first-order tool) 129 0.57
No usable rules (but first-order tool) 129 0.56
No subterm criterion or static subterm criterion (but first-order tool) 126 0.64
No dependeny graph (but first-order tool) 126 0.61
No reduction triples (but first-order tool) 123 0.55
Static DPs with only techniques from the paper (so no first-order tool) 124 0.07
Static DPs with techniques from the paper and first-order tool 129 0.54

(full table and experimental setup for the third experiment)

To further assess the strength of the individual methods, we have also run Wanda with only one or two techniques: either the subterm criterion or reduction triples, each coupled with nothing, the dependency graph, usable rules or formative rules. This provides the following results:

Technique #Yes Avg. time
Only the (static) subterm criterion 29 0.00
The (static) subterm criterion and dependency graph 92 0.01
The (static) subterm criterion and formative rules 29 0.00
The (static) subterm criterion and usable rules 29 0.00
Only reduction triples 105 0.98
Reduction triples and the dependency graph 107 0.62
Reduction triples and formative rules 108 0.86
Reduction triples and usable rules 120 0.18

(full table and experimental setup for the fourth experiment)

This reinforces the point that all techniques seem to add something, although the boost depends on what other techniques are available, and may be small.