These experiments were done for the paper A static higher-order dependency pair framework, published in the proceedings for ESOP 2019. All experiments were performed with Wanda version 2.1, on a 60 second timeout.
We performed two series of experiments. In the first, we compared the static DP framework to the alternatives implemented in Wanda: no dependency pairs (so only rule removal using reduction orderings) and dynamic dependency pairs following [KopRaa12]. We compared both the power if only techniques discussed in the paper are used, if an additional delegation to a first-order prover is used, and if the methods are combined. This gives the following results:
| Technique | #Yes | #No | Avg. time |
| Rule removal using reduction pairs and basic non-termination checking only (no DPs) | 92 | 9 | 0.32 |
| Static DPs with only techniques from the paper | 124 | 0 | 0.07 |
| Dynamic DPs with only techniques from the paper | 132 | 0 | 0.52 |
| Static DPs with basic non-termination and delegation to a first-order prover | 129 | 16 | 0.58 |
| Dynamic DPs with basic non-termination and delegation to a first-order prover | 137 | 16 | 0.93 |
| Static and dynamic DPs combined, with basic non-termination and delegation to a first-order prover | 150 | 16 | 0.70 |
(full table and experimental setup for the first experiment)
Note that the non-termination processor has not yet been implemented in Wanda; non-termination verification is done exclusively through basic checks outside the DP frameworks, and by verifying the NO results of first-order delegation within the DP framework (see also our paper at FroCoS'11). This is why there are no NO results when only the techniques in the paper are enabled.
The second set of experiments aimed to assess the power of individual techniques within the static DP framework. To this end, we have run Wanda with only the techniques from the current paper five more times, each with one technique disabled. This gives the following result:
| Technique | #Yes | Avg. time |
| No formative rules | 124 | 0.07 |
| No usable rules | 124 | 0.10 |
| No subterm criterion or static subterm criterion | 121 | 0.16 |
| No dependeny graph | 121 | 0.35 |
| No reduction triples | 92 | 0.01 |
(full table and experimental setup for the second experiment)
Note that, with the exception of reduction triples, none of the techniques seem to give much power when compared to the whole: it appears that for most benchmarks, several different techniques are feasible. This is emphasised by the following table, where we tested the power of Wanda when the use of a first-order tool was enabled. In this setting, we may even surrender the use of reduction triples without losing much power:
| Technique | #Yes | Avg. time |
| No formative rules (but first-order tool) | 129 | 0.57 |
| No usable rules (but first-order tool) | 129 | 0.56 |
| No subterm criterion or static subterm criterion (but first-order tool) | 126 | 0.64 |
| No dependeny graph (but first-order tool) | 126 | 0.61 |
| No reduction triples (but first-order tool) | 123 | 0.55 |
| Static DPs with only techniques from the paper (so no first-order tool) | 124 | 0.07 |
| Static DPs with techniques from the paper and first-order tool | 129 | 0.54 |
(full table and experimental setup for the third experiment)
To further assess the strength of the individual methods, we have also run Wanda with only one or two techniques: either the subterm criterion or reduction triples, each coupled with nothing, the dependency graph, usable rules or formative rules. This provides the following results:
| Technique | #Yes | Avg. time |
| Only the (static) subterm criterion | 29 | 0.00 |
| The (static) subterm criterion and dependency graph | 92 | 0.01 |
| The (static) subterm criterion and formative rules | 29 | 0.00 |
| The (static) subterm criterion and usable rules | 29 | 0.00 |
| Only reduction triples | 105 | 0.98 |
| Reduction triples and the dependency graph | 107 | 0.62 |
| Reduction triples and formative rules | 108 | 0.86 |
| Reduction triples and usable rules | 120 | 0.18 |
(full table and experimental setup for the fourth experiment)
This reinforces the point that all techniques seem to add something, although the boost depends on what other techniques are available, and may be small.