Institute for Computing and Information
Sciences

Machine Learning and Data Mining

Latest News

Lecturer: Peter Lucas
Lecture room: A1020 (Toernooiveld 1)
Lecture: Monday, 10.30; Tutorial: Friday, 10.30
Lectures start on Monday, 17 March, 2003
Practical assignment [gzipped PS; PDF]
16 May, 2003: assessment

Rationale

In an era where computers are widespread in society, people are collecting all sorts of data, mostly in an attempt to enhance their understanding and insights into various processes. For example, companies and organisations are collecting data concerning the preferences and behaviours of their customers and clients, and use data-mining techniques to extract useful knowledge from these data. Data Mining has close relationship to Statistics, Machine Learning, and Artificial Intelligence, and involves research into learning representations from data, the mathematics of learning, the process of data mining and knowledge discovery, and the construction and exploitation of software tools.

This part of the Information Retrieval 2 course aims to convey the basic ideas underlying modern datamining and knowledge discovery from data, at the same time introducing you to some of the software tools used for data mining.

Lectures

Introduction (Slides: [PS, gzipped]; [PDF])
Classification (Slides: [PS, gzipped]; [PDF])
- performance measures
- classification rules
- rule-learning algorithms
- decision trees
Bayesian models and logistic regression (Slides: [PS, gzipped]; [PDF])
- structure and meaning of Bayesian networks
- naive Bayes' model, TANs
- logistic regression and classification
- structure learning
Refinement and evaluation (Slides: [PS, gzipped]; [PDF])
- cut-off points and ROC curves
- holdout method
- cross-validation
- the bootstarp
- boosting and bagging
Clustering (Slides: [PS, gzipped]; [PDF])
- market basket analysis
- association rules (Apriori)
- k-means algorithm
- hierarchical clustering
Basics of Neural Networks (Slides: [PDF])
- the brain
- basic mathematics
- McCulloch Pitts neuron and activation function
- perceptron
- multilayer feedforward neural networks
- back-propagation

Practical and Tutorials

Assignment [gzipped PS; PDF]
Software: WEKA (uitpakken met jar -xvf weka-3-2.jar)
Introduction to the WEKA data-mining workbench
Introduction to Probability Theory [1/page-PDF; 2/page-PDF]
Summary probability theory [Slides PS, gzip; Slides PDF]
Exercises (will be distributed at the tutorials):
- Exercises I: Revision
- Exercises II: Practical symbolic machine learning
- Exercises III: Bayesian networks
- Exercises IV: Evaluation and unsupervised learning

Additional Resources

Example logistic regression equation
ROC curves explained
Data mining for the corporate masses
The R Project: excellent software for statistics and data mining with its own programming language
XELOPES: data-mining library
Data Mining Cup International student competition in data mining
Salford Systems: training in data mining
Clementine data mining suite (Sold by SPSS)
SAS data mining software
UCI Machine Learning Repository

Peter Lucas | Staff & Students | Computing Science
University of Nijmegen

Last updated: 16 March, 2003
peterl@cs.ru.nl