Home

Research

Publications

Talks

Software

Links

Homepage of Sicco Verwer

My profile in Google scholar citations.

All () Journal () Conference () Other ()
Jump to year: 2013201220112010200920082004-2007

2013

Eduardo Costa, Sicco Verwer and Hendrik Blockeel
In Intelligent Data Analysis, . to appear.

Download

Bibtex

@inproceedings{costa2013,
author = {Eduardo Costa and Sicco Verwer and Hendrik Blockeel},
title = {Estimating Prediction Certainty in Decision Trees },
booktitle = {Intelligent Data Analysis},
year = {2013},
pages = {},
}

Abstract

Decision trees estimate prediction certainty using the class distribution in the leaf responsible for the prediction. We introduce an alternative method that yields better estimates. For each instance to be predicted, our method inserts the instance to be classified in the training set with one of the possible labels for the target attribute; this procedure is repeated for each one of the labels. Then, by comparing the outcome of the different trees, the method can identify instances that might present some difficulties to be correctly classified, and attribute some uncertainty to their prediction. We perform an extensive evaluation of the proposed method, and show that it is particularly suitable for ranking and reliability estimations. The ideas investigated in this paper may also be applied to other machine learning techniques, as well as combined with other methods for prediction certainty estimation.
Maurice Bruynooghe, Hendrik Blockeel, Bart Bogaerts, Broes de Cat, Stef de Pooter, Joachim Jansen, Anthony Labarre, Jan Ramon, Marc Denecker and Sicco Verwer
In Theory and Practice of Logic Programming, 2013. submitted.

Download

Bibtex

@article{bruynooghe2013,
author = {Maurice Bruynooghe and Hendrik Blockeel and Bart Bogaerts and Broes de Cat and Stef de Pooter and Joachim Jansen and Anthony Labarre and Jan Ramon and Marc Denecker and Sicco Verwer},
title = {Predicate Logic as a Modeling Language: Modeling and Solving some Machine Learning and Data Mining Problems with IDP3 },
journal = {Theory and Practice of Logic Programming},
year = {2013},
volume = {},
number = {},
pages = {},
}

Abstract

This paper explores the use of predicate logic as a modeling language. Using IDP3, a finite model generator that supports first order logic enriched with types, inductive definitions, aggregates and partial functions, search problems stated in a variant of predicate logic are solved. This variant is introduced and applied on a range of problems stemming from machine learning and data mining. In those areas, recently a strong interest has grown in the use of declarative modeling and constraint solving as a solution for their problems. We illustrate this methodology with three real world problems from that area. The first problem is in the domain of stemmatology, a domain of philology concerned with the relationship between surviving variant versions of text. The second problem is about a somewhat related problem within biology where phylogenetic trees are used to represent the evolution of species. The third and final problem concerns the classical problem of learning a minimal automaton consistent with a given set of strings. For this last problem, we show that the performance of our solution comes very close to that of a state-of-the art solution. We analyze the use of predicate logic in the three applications and analyze how alternative models affect the performance.
Anthony Labarre and Sicco Verwer
In IEEE/ACM transactions on Computational Biology and Bioinformaticss, 2013. submitted.

Download

Bibtex

@article{labarre13BIO,
author = {Anthony Labarre and Sicco Verwer},
title = {Merging partially labelled trees: hardness and a declarative programming solution.},
journal = {IEEE/ACM transactions on Computational Biology and Bioinformaticss},
year = {2013},
volume = {},
number = {},
pages = {},
}

Abstract

Intraspecific studies often make use of haplo type networks instead of gene genealogies to represent the evolution of a set of genes. Cassens et al. proposed one such network reconstruction method, based on the global maximum parsimony principle, which was later recast by the first author of the present work as the problem of finding a minimum common supergraph of a set of t partially labelled trees. Although algorithms were proposed for solving the problem on two graphs, the complexity of the general problem remains unknown. In this paper, we show that the corresponding decision problem is NP-complete for t = 3. We then propose a declarative programming approach to solving the problem to optimality in practice, as well as a heuristic approach, both based on the IDP system, and assess the performance of both methods on randomly generated data.
Arjen Hommerson, Sicco Verwer and Peter Lucas
In KR4HC/ProHealth, at AIME. to appear.

Download

Bibtex

@inproceedings{hommerson13KR4HC,
author = {Arjen Hommerson and Sicco Verwer and Peter Lucas},
title = {Discovering Probabilistic Structures of Care.},
booktitle = {KR4HC/ProHealth},
year = {2013},
pages = {},
}

Abstract

Medical protocols and guidelines can be looked upon as concurrent programs, where the patients dynamically change over time. Methods based on verification and model-checking developed in the past have been shown to offer insight into their correctness by adopting a logical point of view. However, there is uncertainty involved both in the management of the disease and the way the disease will develop, and, therefore, a probabilistic view on medical protocols seems more appropriate. On the other hand, representations using Bayesian networks usually involve a single patient group and do not capture the dynamic nature of care. In this paper, we propose a new method inspired by automata learning to represent and identify patient groups for obtaining insight into the care that patients have received. We evaluate this approach using data obtained from general practitioners and identify significant differences in patients who were diagnosed with a transient ischemic attack (TIA). Finally, we discuss the implications of such a computational method for the analysis of medical protocols.
Fides Aarts, Harco Kuppens, Jan Tretmans, Frits Vaandrager and Sicco Verwer
In Machine Learning, 2013. to appear.

Download

Bibtex

@article{aarts13MLj,
author = {Fides Aarts and Harco Kuppens and Jan Tretmans and Frits Vaandrager and Sicco Verwer},
title = {Improving active Mealy machine learning for protocol performance testing.},
journal = {Machine Learning},
year = {2013},
volume = {},
number = {},
pages = {},
}

Abstract

Using a well-known industrial case study from the verification literature, the bounded retransmission protocol, we show how active learning can be used to establish the correctness of protocol implementation I relative to a given reference implementation R. Using active learning, we learn a model MR of reference implementation R, which serves as input for a model based testing tool that checks conformance of implementation I to MR . In addition, we also explore an alternative approach in which we learn a model MI of implementation I, which is compared to model MR using an equivalence checker. Our work uses a unique combination of software tools for model construction (Uppaal), active learning (LearnLib, Tomte), model-based testing (JTorX, TorXakis) and verification (CADP, MRMC). We show how these tools can be used for learning models of and revealing errors in implementations, present the new notion of a conformance oracle, and demonstrate how conformance oracles can be used to speed up conformance checking.
Sicco Verwer, Remi Eyraud and Colin de la Higuera
In Machine Learning, 2013. to appear.

Download

Bibtex

@article{verwer13MLj,
author = {Sicco Verwer and Remi Eyraud and Colin de la Higuera},
title = {Pautomac: a PFA/HMM learning competition.},
journal = {Machine Learning},
year = {2013},
volume = {},
number = {},
pages = {},
}

Abstract

Approximating distributions over strings is a hard learning problem. Typical techniques involve using finite state machines as models and attempting to learn these; these machines can either be hand built and then have their weights estimated, or built by grammatical inference techniques: the structure and the weights are then learned simultaneously. The PAutomaC competition, run in 2012, was the first challenge to allow comparison between methods and algorithms and built a first state of the art for these techniques. Both artificial data and real data were proposed and contestants were to try to estimate the probabilities of new strings. The purpose of this paper is to describe some of the technical and intrinsic challenges such a competition has to face, to give a broad state of the art concerning both challenges dealing with learning grammars and finite state machines and the relevant literature. This paper also provides the results of the competition and a brief description and analysis of the different approaches the main participants used.
Christophe Costa Florencio and Sicco Verwer
In Theoretical Computer Science, 2013. Submitted.

Download

Bibtex

@article{florencioverwer13TCS,
author = {Christophe Costa Florencio and Sicco Verwer},
title = {Regular Inference as Vertex Coloring},
journal = {Theoretical Computer Science},
year = {2013},
volume = {},
number = {},
pages = {},
}

Abstract

This paper is concerned with the problem of supervised learning of deterministic finite state automata, in the technical sense of identification in the limit from complete data, by finding a minimal DFA consistent with the data (regular inference). We solve this problem by translating it in its entirety to a vertex coloring problem. Essentially, such a problem consists of two types of constraints that restrict the hypothesis space: inequality and equality constraints. Inequality constraints translate to the vertex coloring problem in a very natural way. Equality constraints however greatly complicate the translation to vertex coloring. In previous coloring-based translations, these were therefore encoded either dynamically by modifying the vertex coloring instance on-the-fly, or by encoding them as satisfiability problems. We provide the first translation that encodes both types of constraints together in a pure vertex coloring instance. We prove the correctness of the construction, and show that regular inference and vertex coloring are in some sense equally hard. The coloring approach offers many opportunities for applying insights from combinatorial optimization and graph theory to regular inference. We immediately obtain new complexity bounds, as well as a family of new learning algorithms which can be used to obtain exact hypotheses as well as fast approximations.
Sicco Verwer, Susan van den Braak and Sunil Choenni
In International Conference on Scientific and Statistical Database Management, .

Download

Bibtex

@inproceedings{verwer13SSDBM,
author = {Sicco Verwer and Susan van den Braak and Sunil Choenni},
title = {Sharing confidential data for algorithm development by multiple imputation.},
booktitle = {International Conference on Scientific and Statistical Database Management},
year = {2013},
pages = {},
}

Abstract

The availability of real-life data sets is of crucial importance for research and development in many fields. Often, however, real-life databases may not be released or may be released for a limited time to a limited group of researchers due to the proprietary and confidential nature of the data. This can be problematic when designing algorithms and applications that should operate on such data, since this often requires insight into the specific properties of such databases. We propose to solve this problem using the statistical technique of multiple imputation. Although it was originally designed for imputing missing values in data sets, multiple imputation is also a very powerful method for generating realistic synthetic data sets. We show how this can be achieved using only a few lines of R code and the MICE multiple imputation tool. We apply this method to the problem of generating a realistic synthetic copy of a database used for fraud detection. In addition to generating a synthetic data set of the records in this database, we show how these records can be combined into networked data using clustering techniques.
Susan van den Braak, Sunil Choenni and Sicco Verwer

Download

Bibtex

@chapter{braak13DISC,
author = {Susan van den Braak and Sunil Choenni and Sicco Verwer},
title = {Combining and Analyzing Judicial Databases.},
}

Abstract

To monitor crime and law enforcement, databases of several organizations, covering different parts of the criminal justice system, have to be integrated. Combined data from different organizations may then be analyzed, for instance, to investigate how specific groups of suspects move through the system. Such insight is useful for several reasons, for example, to define an effective and coherent safety policy. To integrate or relate judicial data two approaches are currently employed: a data warehouse and a dataspace approach. The former is useful for applications that require combined data on an individual level. The latter is suitable for data with a higher level of aggregation. However, developing applications that exploit combined judicial data is not without risk. One important issue while handling such data is the protection of the privacy of individuals. Therefore, several precautions have to be taken in the data integration process: use aggregate data, follow the Dutch Personal Data Protection Act, and filter out privacy-sensitive results. Another issue is that judicial data is essentially different from data in exact or technical sciences. Therefore, data mining should be used with caution, in particular to avoid incorrect conclusions and to prevent discrimination and stigmatization of certain groups of individuals.
Sicco Verwer and Toon Calders Toon Calders

Download

Bibtex

@chapter{verwercalders13DISC,
author = {Sicco Verwer and Toon Calders},
title = {Introducing Positive Discrimination in Predictive Models.},
}

Abstract

In this chapter we give three solutions for the discrimination-aware classification problem that are based upon Bayesian classifiers. These classifiers model the complete probability distribution by making strong independence assumptions. First we discuss the necessity of having discrimination-free classification for probabilistic models. Then we will show three ways to adapt a Naive Bayes classifier in order to make it discrimination-free. The first technique is based upon setting different thresholds for the different communities. The second technique will learn two different models for both communities, while the third model describes how we can incorporate our belief of how discrimination was added to the decisions in the training data as a latent variable. By explicitly modeling the discrimination, we can reverse engineer decisions. Since all three models can be seen as ways to introduce positive discrimination, we end the chapter with a reflection on positive discrimination.

2012

In Machine Learning, vol. 86(3), pp. 295-333, 2012.

Download

Bibtex

@article{verwer12MLj,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = { Efficiently identifying deterministic real-time automata from labeled data. },
journal = {Machine Learning},
year = {2012},
volume = {86},
number = {3},
pages = {295--333},
}

Abstract

We develop a novel learning algorithm RTI for identifying a deterministic real-time automaton (DRTA) from labeled time-stamped event sequences. The RTI algorithm is based on the current state of the art in deterministic finite-state automaton (DFA) identification, called evidence-driven state-merging (EDSM). In addition to having a DFA structure, a DRTA contains time constraints between occurrences of consecutive events. Although this seems a small difference, we show that the problem of identifying a DRTA is much more difficult than the problem of identifying a DFA: identifying only the time constraints of a DRTA given its DFA structure is already NP-complete. In spite of this additional complexity, we show that RTI is a correct and complete algorithm that converges efficiently (from polynomial time and data) to the correct DRTA in the limit. To the best of our knowledge, this is the first algorithm that can identify a timed automaton model from time-stamped event sequences. A straightforward alternative to identifying DRTAs is to identify a DFA that models time implicitly, i.e., a DFA that uses different states for different points in time. Such a DFA can be identified by first sampling the timed sequences using a fixed frequency, and subsequently applying EDSM to the resulting non-timed event sequences. We evaluate the performance of both RTI and this sampling approach experimentally on artificially generated data. In these experiments RTI outperforms the sampling approach significantly. Thus, we show that if we obtain data from a real-time system, it is easier to identify a DRTA from this data than to identify an equivalent DFA.
Marijn Heule and Sicco Verwer
In Empirical Software Engineering, vol. in print,2012. STAMINA competition winner.

Download

Bibtex

@article{heuleverwer12ESE,
author = {Marijn Heule and Sicco Verwer},
title = {Software model synthesis using satisfiability solvers.},
journal = {Empirical Software Engineering},
year = {2012},
volume = {in print},
number = {},
pages = {},
}

Abstract

We introduce a novel approach for synthesis of software models based on identifying deterministic finite state automata. Our approach consists of three important contributions. First, we argue that in order to model software, one should focus mainly on observed executions (positive data), and use the randomly generated failures (negative data) only for testing consistency. We present a new greedy heuristic for this purpose, and show how to integrate it in the state-of-the-art evidence-driven state-merging (EDSM) algorithm. Second, we apply the enhanced EDSM algorithm to iteratively reduce the size of the problem. Yet during each iteration, the evidence is divided over states and hence the effectiveness of this algorithm is decreased. We propose—when EDSM becomes too weak—to tackle the reduced identification problem using satisfiability solvers. Third, in case the amount of positive data is small, we solve the identification problem several times by randomizing the greedy heuristic and combine the solutions using a voting scheme. The interaction between these contributions appeared crucial to solve hard software models synthesis benchmarks. Our implementation, called DFASAT, won the StaMinA competition.
Christophe Costa Florencio and Sicco Verwer
In Algorithmic Learning Theory, .

Download

Bibtex

@inproceedings{florencioverwer12ALT,
author = {Christophe Costa Florencio and Sicco Verwer},
title = {Regular Inference as Vertex Coloring},
booktitle = {Algorithmic Learning Theory},
year = {2012},
series = {},
volume = {7568},
pages = {81--95},
}

Abstract

This paper is concerned with the problem of supervised learning of deterministic finite state automata, in the technical sense of identification in the limit from complete data, by finding a minimal DFA consistent with the data (regular inference). We solve this problem by translating it in its entirety to a vertex coloring problem. Essentially, such a problem consists of two types of constraints that restrict the hypothesis space: inequality and equality constraints. Inequality constraints translate to the vertex coloring problem in a very natural way. Equality constraints however greatly complicate the translation to vertex coloring. In previous coloring-based translations, these were therefore encoded either dynamically by modifying the vertex coloring instance on-the-fly, or by encoding them as satisfiability problems. We provide the first translation that encodes both types of constraints together in a pure vertex coloring instance. This offers many opportunities for applying insights from combinatorial optimization and graph theory to regular inference. We immediately obtain new complexity bounds, as well as a family of new learning algorithms which can be used to obtain both exact hypotheses, as well as fast approximations.
Yingqian Zhang and Sicco Verwer
In Principles and practive of multi-agent systems, .

Download

Bibtex

@inproceedings{zhangverwer12PRIMA,
author = {Yingqian Zhang and Sicco Verwer},
title = {Mechanism for Robust Procurements},
booktitle = {Principles and practive of multi-agent systems},
year = {2012},
series = {},
volume = {7455},
pages = {77--91},
}

Abstract

We model robust procurement as an optimization problem. We show that its decision version is NP-complete, and propose a backtracking algorithm with cuts that reduce the search-space to find the optimal solution. We then develop a mechanism that motivates agents to truthfully report their private information (i.e., truthful in dominant strategies), maximizes the social welfare (efficient), and ensures non-negative utilities of the participating agents even after execution (post-execution individually rational). In the experiments, we compare our mechanism with an iterated greedy first-price mechanism that represents the current practice in public procurements, in terms of the expected social welfare and the expected payments of the auctioneer. The results show that in terms of social welfare, our mechanism outperforms the greedy approach in all cases except when there exist cheap and reliable agents who can finish the job in time. In terms of payments, our mechanism outperforms the current practice when there are many potential contractors and the optimization constraints are tight.
Hendrik Blockeel, Maurice Bruynooghe, Broes de Cat, Stef de Pooter, Marc Denecker, Anthony Labarre, Jan Ramon and Sicco Verwer
In Technical communications of the ICLP, .

Download

Bibtex

@inproceedings{blockeel12ICLP,
author = {Hendrik Blockeel and Maurice Bruynooghe and Broes de Cat and Stef de Pooter and Marc Denecker and Anthony Labarre and Jan Ramon and Sicco Verwer},
title = {Modeling machine learning problems with FO(.).},
booktitle = {Technical communications of the ICLP},
year = {2012},
series = {},
volume = {17},
pages = {14--25},
}

Abstract

This paper reports on the use of the FO(·) language and the IDP framework for modeling and solving some machine learning and data mining tasks. The core component of a model in the IDP framework is an FO(·) theory consisting of formulas in first order logic and definitions; the latter are basically logic programs where clause bodies can have arbitrary first order formulas. Hence, it is a small step for a well-versed computer scientist to start modeling. We describe some models resulting from the collaboration between IDP experts and domain experts solving machine learning and data mining tasks. A first task is in the domain of stemmatology, a domain of philology concerned with the relationship between surviving variant versions of text. A second task is about a somewhat similar problem within biology where phylogenetic trees are used to represent the evolution of species. A third and final task is about learning a minimal automaton consistent with a given set of strings. For each task, we introduce the problem, present the IDP code and report on some experiments.
Sicco Verwer and Yingqian Zhang
In AAMAS, .

Download

Bibtex

@inproceedings{verwerzhang12AAMAS,
author = {Sicco Verwer and Yingqian Zhang},
title = {Revenue prediction in budget-constrained sequential auctions with complementarities (extended abstract).},
booktitle = {AAMAS},
year = {2012},
pages = {},
}

Abstract

When multiple items are auctioned sequentially, the ordering of auctions plays an important role in the total revenue collected by the auctioneer. This is true especially with budget constrained bidders and the presence of complementarities among items. It is distributionscult to develop educient algorithms for finding an optimal sequence of items. However, when historical data are available, it is possible to learn a model in order to predict the outcome of a given sequence. In this work, we show how to construct such a model, and provide methods that finds a good sequence for a new set of items given the learned model. We develop an auction simulator and design several experiment settings to test the performance of the proposed methods.
Sicco Verwer, Remi Eyraud and Colin de la Higuera
In International Conference on Grammatical Inference, .

Download

Bibtex

@inproceedings{verwer12ICGI,
author = {Sicco Verwer and Remi Eyraud and Colin de la Higuera},
title = {Results of the PAutomaC probabilistic automaton learning competition.},
booktitle = {International Conference on Grammatical Inference},
year = {2012},
series = {},
volume = {21},
pages = {243--248},
}

Abstract

Approximating distributions over strings is a hard learning problem. Typical GI techniques involve using finite state machines as models and attempting to learn both the structure and the weights, simultaneously. The PAutomaC competition is the first challenge to allow comparison between methods and algorithms and builds a first state of the art for these techniques. Both artificial data and real data were proposed and contestants were to try to estimate the probabilities of test strings. The purpose of this paper is to provide an overview of the implementation details of PAutomaC and to report the final results of the competition.
Fides Aarts, Harco Kuppens, Jan Tretmans, Frits Vaandrager and Sicco Verwer
In International Conference on Grammatical Inference, .

Download

Bibtex

@inproceedings{aarts12ICGI,
author = {Fides Aarts and Harco Kuppens and Jan Tretmans and Frits Vaandrager and Sicco Verwer},
title = {Learning and testing the bounded retransmission protocol.},
booktitle = {International Conference on Grammatical Inference},
year = {2012},
series = {},
volume = {21},
pages = {4--18},
}

Abstract

Using a well-known industrial case study from the verification literature, the bounded retransmission protocol, we show how active learning can be used to establish the correctness of protocol implementation I relative to a given reference implementation R. Using active learning, we learn a model MR of reference implementation R, which serves as input for a model based testing tool that checks conformance of implementation I to MR. In addition, we also explore an alternative approach in which we learn a model MI of implementation I, which is compared to model MR using an equivalence checker. Our work uses a unique combination of software tools for model construction (Uppaal), active learning (LearnLib, Tomte), model-based testing (JTorX, TorXakis) and verification (CADP, MRMC). We show how these tools can be used for learning these models, analyzing the obtained results, and improving the learning performance.
Faisal Kamiran, Asim Karim, Sicco Verwer and Heike Goudriaan
In Discrimination and Privacy-Aware Data Mining, at ICDM.

Download

Bibtex

@inproceedings{kamiran12DPADM,
author = {Faisal Kamiran and Asim Karim and Sicco Verwer and Heike Goudriaan},
title = {Avoiding discrimination when classifying socially sensitive data.},
booktitle = {Discrimination and Privacy-Aware Data Mining},
year = {2012},
pages = {},
}

Abstract

Social discrimination against certain sensitive groups within society (e.g., females, blacks, minorities) is prohibited by law in many countries. To prevent discrimination arising from the use of discriminatory data, recent research has focused on methods for making classifiers learned over discriminatory data discrimination-aware. Most of these methods have been tested on standard classification datasets that have been tweaked for discrimination analysis rather than over actual discriminatory data. In this paper, we present an analysis of discrimination-aware classification using a real-world dataset of Statistics Netherlands, which is a census body in the Netherlands. Specifically, we consider the use of classifiers for predicting whether an individual is a crime suspect or not to support law enforcement and security agencies’ decision making. Our results show that discrimination does exist in real world datasets and blind use of classifiers learned over such datasets can exacerbate the discrimination problem, which can have serious consequences. We demonstrate that discrimination-aware classification methods are very useful in such situations to mitigate the discriminatory effects and that they lead to rational and legally acceptable decision making.
Eduardo P. Costa, Sicco Verwer and Hendrik Blockeel
In Benelearn, .

Download

Bibtex

@inproceedings{costa12benelearn,
author = {Eduardo P. Costa and Sicco Verwer and Hendrik Blockeel},
title = {Redefining the calculation of prediction certainty for decision trees.},
booktitle = {Benelearn},
year = {2012},
pages = {75--76},
}

Abstract

In classification, we often need classifiers that not only have high accuracy, but can also tell how certain they are about their predictions. In decision trees, a prediction's certainty is usually estimated using the class distribution in the leaf responsible for the prediction. We introduce an alternative method that yields better estimates. For each instance to be predicted, the proposed method performs the following transductive learning strategy. The instance is inserted in the training set with one of the possible labels for the target attribute; this procedure is repeated for each one of the labels. Then, by comparing the outcome of the different trees, the method can identify instances that might present some difficulties to be correctly classified, and attribute some uncertainty to their prediction.
Sicco Verwer and Yingqian Zhang
In BNAIC, .

Download

Bibtex

@inproceedings{zhangverwer12BNAIC,
author = {Sicco Verwer and Yingqian Zhang},
title = {Mechanism for robust procurements.},
booktitle = {BNAIC},
year = {2012},
pages = {},
}

Abstract

The increasing popularity of adopting auctions is largely due to its efficiency of allocating goods. However, in the face of uncertainties on services, the winner determination solution is often not robust enough to ensure a reliable outcome. This paper aims to design a more robust auction by introducing redundancy into the selected solution. More specifically, we construct an algorithm and a mechanism for incentivizing truth-telling in public procurement problems with uncertainties. Our contributions are the development of a framework for studying such procurement problems, proving that minimizing cost in this framework is NP-complete, developing a quick algorithm that minimizes this cost, and providing a novel multi-stage mechanism that has desirable properties such as efficient, truthful in dominant strategies, and post-execution individually rational. We show experimentally that our approach significantly outperforms the current practice in many settings.

2011

In Information and Computation, vol. 209(3), pp. 606-625, 2011.

Download

Bibtex

@article{verwer11InfCom,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {The efficiency of identifying timed automata and the power of clocks.},
journal = {Information and Computation},
year = {2011},
volume = {209},
number = {3},
pages = {606--625},
}

Abstract

We develop theory on the efficiency of identifying (learning) timed automata. In particular, we show that: (i) deterministic timed automata cannot be identified efficiently in the limit from labeled data and (ii) that one-clock deterministic timed automata can be identified efficiently in the limit from labeled data. We prove these results based on the distinguishability of these classes of timed automata. More specifically, we prove that the languages of deterministic timed automata cannot, and that one-clock deterministic timed automata can be distinguished from each other using strings in length bounded by a polynomial. In addition, we provide an algorithm that identifies one-clock deterministic timed automata efficiently from labeled data. Our results have interesting consequences for the power of clocks that are interesting also out of the scope of the identification problem.
In IJCAI, .

Download

Bibtex

@inproceedings{verwer11IJCAI,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {Learning Driving Behavior by Timed Syntactic Pattern.},
booktitle = {IJCAI},
year = {2011},
pages = {1529--1534},
}

Abstract

We advocate the use of an explicit time representation in syntactic pattern recognition because it can result in more succinct models and easier learning problems. We apply this approach to the real-world problem of learning models for the driving behavior of truck drivers. We discretize the values of onboard sensors into simple events. Instead of the common syntactic pattern recognition approach of sampling the signal values at a fixed rate, we model the time constraints using timed models. We learn these models using the RTI+ algorithm from grammatical inference, and show how to use computational mechanics and a form of semi-supervised classification to construct a real-time automaton classifier for driving behavior. Promising results are shown using this new approach.
Sicco Verwer and Yingqian Zhang
In BNAIC, .

Download

Bibtex

@inproceedings{verwerzhang11BNAIC,
author = {Sicco Verwer and Yingqian Zhang},
title = {Learning revenue-maximizing orderings in sequential auctions.},
booktitle = {BNAIC},
year = {2011},
pages = {},
}

Abstract

When multiple items are auctioned sequentially, the ordering of auctions plays an important role in the total revenue collected by the auctioneer. This is true especially with budget constrained bidders and the presence of complementarities among items. In such sequential auction settings, it is difficult to develop efficient algorithms for finding an optimal sequence of items. However, when historical data are available it is possible to learn good orderings that increase the revenue of the auctioneer. In this work, we show how such a learning model can be built based on previous auctions using regression trees. We provide a greedy method that finds a good sequence for a new set of items given the learned model. We design several experiment scenarios and test the performance of the proposed learning method. The experimental results are promising: they show that good orderings can be found quickly.

2010

Toon Calders Toon Calders and Sicco Verwer
In Data mining and knowledge discovery, vol. 21(2), pp. 277-292, 2010.

Download

Bibtex

@article{caldersverwer10DMKD,
author = {Toon Calders and Sicco Verwer},
title = {Three naive Bayes approaches for discrimination-free classification.},
journal = {Data mining and knowledge discovery},
year = {2010},
volume = {21(2)},
number = {},
pages = {277--292},
}

Abstract

In this paper, we investigate how to modify the naive Bayes classifier in order to perform classification that is restricted to be independent with respect to a given sensitive attribute. Such independency restrictions occur naturally when the decision process leading to the labels in the data-set was biased; e.g., due to gender or racial discrimination. This setting is motivated by many cases in which there exist laws that disallow a decision that is partly based on discrimination. Naive application of machine learning techniques would result in huge fines for companies. We present three approaches for making the naive Bayes classifier discrimination-free: (i) modifying the probability of the decision being positive, (ii) training one model for every sensitive attribute value and balancing them, and (iii) adding a latent variable to the Bayesian model that represents the unbiased label and optimizing the model parameters for likelihood using expectation maximization. We present experiments for the three approaches on both artificial and real-life data.
In International Conference on Grammatical Inference, .

Download

Bibtex

@inproceedings{verwer10ICGI,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data.},
booktitle = {International Conference on Grammatical Inference},
year = {2010},
series = {},
volume = {6339},
pages = {203--216},
}

Abstract

We adapt an algorithm (RTI) for identifying (learning) a deterministic real-time automaton (DRTA) to the setting of positive timed strings (or time-stamped event sequences). An DRTA can be seen as a deterministic finite state automaton (DFA) with time constraints. Because DRTAs model time using numbers, they can be exponentially more compact than equivalent DFA models that model time using states. We use a new likelihood-ratio statistical test for checking consistency in the RTI algorithm. The result is the RTI + algorithm, which stands for real-time identification from positive data. RTI + is an efficient algorithm for identifying DRTAs from positive data. We show using artificial data that RTI + is capable of identifying sufficiently large DRTAs in order to identify real-world real-time systems.
Marijn Heule and Sicco Verwer
In International Conference on Grammatical Inference, .

Download

Bibtex

@inproceedings{heuleverwerICGI10,
author = {Marijn Heule and Sicco Verwer},
title = {Exact DFA identification using SAT solvers.},
booktitle = {International Conference on Grammatical Inference},
year = {2010},
series = {},
volume = {6339},
pages = {66--79},
}

Abstract

We present an exact algorithm for identification of deterministic finite automata (DFA) which is based on satisfiability (SAT) solvers. Despite the size of the low level SAT representation, our approach is competitive with alternative techniques. Our contributions are fourfold: First, we propose a compact translation of DFA identification into SAT. Second, we reduce the SAT search space by adding lower bound information using a fast max-clique approximation algorithm. Third, we include many redundant clauses to provide the SAT solver with some additional knowledge about the problem. Fourth, we show how to use the flexibility of our translation in order to apply it to very hard problems. Experiments on a well-known suite of random DFA identification problems show that SAT solvers can efficiently tackle all instances. Moreover, our algorithm outperforms state-of-the-art techniques on several hard problems.
PhD Thesis,Technical University Delft, , 2010

Download

Bibtex

@phdthesis{verwer10thesis,
author = {Sicco Verwer},
title = {Efficient identification of timed automata - theory and practice.},
school = {Technical University Delft},
year = {2010},
}

Abstract

Ekaterina Vasilyeva, Mykola Pechenizkiy, Aleksandra Tesanovic, Evgeny Knutov, Sicco Verwer and Paul de Bra
In EDM, .

Download

Bibtex

@inproceedings{vasilyeva10EDM,
author = {Ekaterina Vasilyeva and Mykola Pechenizkiy and Aleksandra Tesanovic and Evgeny Knutov and Sicco Verwer and Paul de Bra},
title = {Towards EDM Framework for Personalization of Information Services in RPM Systems.},
booktitle = {EDM},
year = {2010},
pages = {331--332},
}

Abstract

Remote Patient Management Systems (RPM), besides monitoring the health conditions of patients, provide them with different information services that currently are predefined and follow a one-size-fits-all paradigm to a large extent. In this work we focus on the problem of knowledge discovery and patient modeling by mining educational data, motivational and instructional feedback provided to patients within RPM system.

2009

In Language and Automata Theory and Applications, .

Download

Bibtex

@inproceedings{verwer09LATA,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {One-Clock Deterministic Timed Automata Are Efficiently Identifiable in the Limit.},
booktitle = {Language and Automata Theory and Applications},
year = {2009},
series = {},
volume = {5457},
pages = {740--751},
}

Abstract

We study the complexity of identifying (learning) timed automata in the limit from data. In previous work, we showed that in order for timed automata to be efficiently identifiable in the limit, it is necessary that they are deterministic and that they use at most one clock. In this paper, we show this is also sufficient: we provide an algorithm that identifies one-clock deterministic timed automata efficiently in the limit.
Marijn Heule and Sicco Verwer
In BNAIC, .

Download

Bibtex

@inproceedings{heuleverwer09BNAIC,
author = {Marijn Heule and Sicco Verwer},
title = {Using a satisfiability solver to identify deterministic finite state automata.},
booktitle = {BNAIC},
year = {2009},
pages = {91--98},
}

Abstract

We present an exact algorithm for identification of deterministic finite automata (DFA) which is based on satisfiability (SAT) solvers. Despite the size of the low level SAT representation, our approach seems to be competitive with alternative techniques. Our contributions are threefold: First, we propose a compact translation of DFA identification into SAT. Second, we reduce the SAT search space by adding lower bound information using a fast max-clique approximation algorithm. Third, we include many redundant clauses to provide the SAT solver with some additional knowledge about the problem. Experiments on a well-known suite of random DFA identification problems show that SAT solvers can efficiently tackle all instances. Moreover, our exact algorithm outperforms state-of-the-art techniques on several hard problems.

2008

In International Conference on Grammatical Inference, .

Download

Bibtex

@inproceedings{verwer08ICGI,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {Polynomial Distinguishability of Timed Automata.},
booktitle = {International Conference on Grammatical Inference},
year = {2008},
series = {},
volume = {5278},
pages = {238--251},
}

Abstract

We study the complexity of identifying (learning) timed automata in the limit from data. Timed automata are finite state models that model time explicitly, i.e., using numbers. Because timed automata use numbers to represent time, they can be exponentially more compact than models that model time implicitly, i.e., using states. We show three results that are essential in order to exactly determine when timed automata are efficiently identifiable in the limit. First, we show that polynomial distinguishability is a necessary condition for efficient identifiability in the limit. Second, we prove that deterministic time automata with two or more clocks are not polynomially distinguishable. As a consequence, they are not efficiently identifiable. Last but not least, we prove that deterministic timed automata with one clock are polynomially distinguishable, which makes them very likely to be efficiently identifiable in the limit.
In Induction of Process Models, at ECML.

Download

Bibtex

@inproceedings{verwer08IPM,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {Efficiently learning simple timed automata.},
booktitle = {Induction of Process Models},
year = {2008},
pages = {61--68},
}

Abstract

We describe an efficient algorithm for learning deterministic real-time automata (DRTA) from positive data. This data can be obtained from observations of the process to be modeled. The DRTA model we learn from such data can be used reason and gain knowledge about real- time systems such as network protocols, business processes, reactive systems, etc.
In Benelearn, .

Download

Bibtex

@inproceedings{verwer08benelearn,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {Efficiently learning timed models from observations.},
booktitle = {Benelearn},
year = {2008},
pages = {75--76},
}

Abstract

2004-2007

In Benelearn, .

Download

Bibtex

@inproceedings{verwer07benelearn,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {An algorithm for learning real-time automata.},
booktitle = {Benelearn},
year = {2004-2007},
pages = {128--135},
}

Abstract

In Grammatical Inference Workshop, .

Download

Bibtex

@inproceedings{verwer06giw,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {On the identifiability in the limit of timed automata.},
booktitle = {Grammatical Inference Workshop},
year = {2004-2007},
pages = {},
}

Abstract

In Benelearn, .

Download

Bibtex

@inproceedings{verwer06benelearn,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {Identifying an automaton model for timed data.},
booktitle = {Benelearn},
year = {2004-2007},
pages = {57--64},
}

Abstract

In Interactive Events of AIED, .

Download

Bibtex

@inproceedings{verwer05AIED,
author = {Sicco Verwer and Mathijs de Weerdt and Jonne Zutt},
title = {A tutoring system to practice theorem proving in Fitch.},
booktitle = {Interactive Events of AIED},
year = {2004-2007},
pages = {33--35},
}

Abstract

In this interactive event we demonstrate a web-based software tool to teach theorem proving in propositional logic, called Bop. This tool is a proof editor in the Fitch proof system that can give hints, proofsteps, or even complete proofs to the student.
In BNAIC, .

Download

Bibtex

@inproceedings{verwer05BNAIC,
author = {Sicco Verwer and Mathijs de Weerdt and Cees Witteveen},
title = {Timed automata for behavioral pattern recognition.},
booktitle = {BNAIC},
year = {2004-2007},
pages = {291--296},
}

Abstract

Arjen Vollebregt, Rich Ost, Sicco Verwer and Janneke Donker
In Aerospace Conference, .

Download

Bibtex

@inproceedings{vollebregt04AERO,
author = {Arjen Vollebregt and Rich Ost and Sicco Verwer and Janneke Donker},
title = {Enhancing diagnostics through the visualization of air vehicle data.},
booktitle = {Aerospace Conference},
year = {2004-2007},
volume = {6},
pages = {3686-- 3691},
}

Abstract

To achieve an affordable system, the JSF is implementing autonomic logistics capability which includes a diagnostics function to review flight collected data and correlate fault event to flight maneuvers. This flight recreation capability provides, for example a maintainer, the capability to recreate a flight based on the data stored by the air vehicle. The recreation of a flight consists of providing a 3D flight viewer, a graph viewer to view fused signal data and a flight data player to navigate the data. This tool set provides the maintainer with insight into the state of the air vehicle when an event occurred. This supports more efficient maintenance of air vehicles. The flight recreation module permits using data visualization by an engineer to replay the stored flight data. The flight recreation module has the capability to graphically display the behavior of multiple, selectable flight and health parameters in a multi-window, time-synchronized user interface. This paper discusses how the flight recreation module supports the future sustainment of the JSF while delivering an affordable fighter.

Last edit: 11 jul 2013

dr. ir. Sicco Verwer

Mailbox number 47
P.O. Box 9010
NL-6500 GL Nijmegen
The Netherlands
E-mail: s[dot]verwer [at] cs[dot]ru[dot]nl

Institute for Computing and Information Sciences