Bachelor and Master Projects in Machine Learning and Bioinformatics


DESCRIPTION

Machine Learning and Bioinformatics are core research areas in Computer Science with challenging real-life applications in diverse domains. We offer challenging master projects in computer science, information science, and artificial intelligence on both theory and real-life application problems, the latter focussed on biology and life science. These projects are related to research performed in the Machine Learning group. Bachelor projects are also offered, and consist of sub-topics of master projects. Of course, also master projects on these topics based on your ideas are possible, just discuss them with us. Other interesting projects can be found HERE.

Contact information

Elena Marchiori: elenam AT cs DOT ru DOT nl

List of actual master projects


Fast Condensed Nearest Neighbor with Hit Miss Networks

A central issue in nearest neighbor (NN) classification, and more generally in instance-based learning, concerns storage requirements. The basic 1-NN rule stores all training instances, hence can be slow when classifying new instances. Moreover, when the training set contains noisy instances, generalization accuracy can be negatively affected if these instances are stored as well. Instance selection algorithms tackle these issues by selecting a subset of the training set in order to reduce storage and possibly also enhance accuracy of the 1-NN rule on new instances (generalization performance). In this project you will study and implement fast graph-based instance selection algorithms for 1-NN classification. [For more information about the problem and actual approach see the paper Hit Miss Networks with Application to Instance Selection]

Graph-based Feature Selection

Graph-based representations have been recently successfully used in machine learning, in particular for (semi-) supervised learning and feature selection. In this project you will focus on recent graph-based approaches for feature selection, and will investigate and implement state-of-the-art and novel algorithms for this task. [For more information about the problem and actual approach see the paper Spectral Feature Selection for Supervised and Unsupervised Learning]

Protein function assignment based on shared interacting domain patterns

As we move into the post genome-sequencing era, an immediate challenge is how to make best use of the large amount of high-throughput experimental data to assign functions to currently uncharacterized proteins. Recently, a new method for protein function assignment based on shared interacting domain patterns extracted from cross-species protein-protein interaction data has been introduced. In this project you will study this method and investigate possible improvements based on the nearest neighbor rule. [For more information about the problem and actual approach see the paper Protein Function Assignment through Mining Cross-Species Protein-Protein Interactions ]

Prediciting protein subcellular localization

Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. Recently, a new class of protein sequence kernels have been introduced allows the inclusion of pairwise amino acid distances into their computation. In this project you will study this approach and investigate and implement novel graph-based machine learning algorithms for this task. [For more information about the problem and actual approach see the paper An Automated Combination of Kernels for Predicting Protein Subcellular Localization ]