ANUPhysicsSS2008Data mining
From COSNet
|
Summer school Menu
|
Contents |
Outline
- Survey of backgrounds, survey of Data Mining/Machine Learning, relation to complex systems
- Crash course in R
- Classification of Techniques and Method (Regression parametric and non-parametric, Clustering, Classification)
- Projects (data cleaning, pre-processing, DM or Machine Learning application)
Goal
The tutorial is not intended to replace a computer science or statistics course in machine learning and data mining. But it should encourage you to learn more and to look for application areas within your own research projects. Every data-intense project, from genomics to experimental particle physics, will benefit from techniques of how to deal with the data.
R, first aid course
Participants are welcome to do all the programming and visualisation in Matlab. I (Frank) am an R user and will prepare the exercises and solutions with the software package [R], partly because it is widely used by statisticians inside AND outside of academia. It is closely related to the commercial product S/SPLUS which is a standard product in the business world. It is certainly a valuable skill to have!
The Documentation section on the R-project page includes some basic tutorials and a reference card which we will provide as a hard-copy during the course.
Relation to complex systems
Project ideas
Projects should require all the steps mentioned in brackets above
Projects should allow some satisfying result after 2 days!
- Magnetic fusion data (Frank)
- Space Physics data (Frank)
- Image recognition (Kirsty/Frank) (could be medical or tomographic images)
- Text Mining/Classification/Tagging (Kirsty)
common techniques (at least in pre-processing)
- Principal Component Analysis
- non-negative matrix factorisation
- others
(have to be programmed beforehand in Matlab, Scilab and/or R)
Recommended Reading
- Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Morgan Kaufmann 2005.
- Jiawei Han, Micheline Kamber: Data Mining: Concepts and Techniques, Morgan Kauffmann 2000.
- Trevor Hastie, Robert Tibshirani, Jerome Friedman: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer 2001.
