Model Learning

Model learning algorithms allow you to automatically infer behavioral models (typically some state machine) for black-box systems (systems with well defined input/output interfaces, whose source code need not be accessible). We refer to learner as the software component implementing a learning algorithm.

Model learning hinges on the ability of the learner to generate a model consistent with set of observations of the system behavior. An observation comprises a sequence of inputs together with output information describing how the system reacts upon receiving these inputs. Such information is generally (but not necessarily) a sequence of outputs.

Depending on how observations are generated, there are two ways in which learning can be done. In passive learning, a static set of observations is supplied to the learner, which then outputs a model consistent with the observations. The set of observations can be logs of the system behavior or network traces. In active learning, the learner generates observations on-the-fly by building and running queries (sequences of inputs) on the system.

It is important to always be aware of the limitations of the learning algorithms/tools before you start learning. Limitations generally refer to constraints in the expressibility of the systems that can be learned. Not so long ago, most learn-able systems had to be describable by Mealy Machines. Unfortunately, Mealy Machines are limited in their expressivity, since their interfaces only support abstract inputs and outputs. Real systems have often parameterized with large parameter domains. These systems are more compactly described by Extended Finite State Machines (EFSMs), machines supporting parameterized interfaces. It is in adapting learning algorithms for these systems that the core of my research lies, as well as in applying learning to real-world software such as network protocol implementations.

back