Bachelor and Master Projects in Machine Learning and Bioinformatics
DESCRIPTION
Machine Learning and Bioinformatics are core research areas in Computer Science with
challenging real-life applications in diverse domains. We offer challenging master
projects in computer science, information science, and artificial intelligence on both theory and real-life application problems, the latter focussed
on biology and life science. These projects are related to research performed
in the Machine Learning group.
Bachelor projects are also offered, and consist of sub-topics of
master projects. Of course, also master projects on these topics based on
your ideas are possible, just discuss them with us. Other interesting projects can be found HERE.
Elena Marchiori: elenam AT cs DOT ru DOT nl
List of actual master projects
A central issue in nearest neighbor (NN) classification, and more generally in instance-based learning,
concerns storage requirements. The basic 1-NN rule stores all training instances, hence can
be slow when classifying new instances. Moreover, when the training set contains noisy
instances, generalization accuracy can be negatively affected if these instances are stored as
well. Instance selection algorithms tackle these issues by selecting a subset of the
training set in order to reduce storage and possibly also enhance accuracy of the 1-NN rule
on new instances (generalization performance).
In this project you will study and implement fast graph-based instance selection algorithms
for 1-NN classification.
[For more information about the problem and actual approach see the paper Hit Miss Networks with Application to Instance Selection]
Graph-based representations have been recently successfully used in machine learning,
in particular for (semi-) supervised learning and feature selection.
In this project you will focus on recent graph-based approaches for feature selection, and
will investigate and implement state-of-the-art and novel algorithms for this task.
[For more information about the problem and actual approach see the paper
Spectral Feature Selection for Supervised and Unsupervised Learning]
As we move into the post genome-sequencing era, an immediate challenge is how to make best
use of the large amount of high-throughput experimental data to assign functions to
currently uncharacterized proteins. Recently, a new method for
protein function assignment based on shared interacting domain patterns
extracted from cross-species protein-protein interaction data has been introduced.
In this project you will study this method and investigate possible improvements based on
the nearest neighbor rule.
[For more information about the problem and actual approach see the paper
Protein Function Assignment through Mining Cross-Species Protein-Protein Interactions ]
Protein subcellular localization is a crucial ingredient to many important inferences about cellular
processes, including prediction of protein function and protein interactions.
Recently, a new class of protein sequence kernels have been introduced allows
the inclusion of pairwise amino acid distances into their computation.
In this project you will study this approach and investigate and implement novel graph-based
machine learning algorithms for this task.
[For more information about the problem and actual approach see the paper
An Automated Combination of Kernels for Predicting Protein Subcellular Localization
]