Data mining is the analysis of (often large) observational datasets to find unsuspected relationships and summarise the data in novel ways that are both understandable and useful to the data owner(s). The analysis methods fall into two categories: Computational data mining and Statistical data mining. Computational methods originate from Machine Learning, which is a branch of Computer Science (Artificial Intelligence). Statistical methods originate from a branch of Statistics called Statistical pattern recognition. Observational data is data that was collected for some other purpose, e.g. banking data for loan applications and repayments, and is then used for analysis to determine good borrowers and risky borrowers. The objectives of the module are: to introduce the commonly used data mining methods, and to enable the student to acquire practical data mining skills. The module covers Computational and Statistical data mining methods as well as the commonly used process models for data mining projects. The topics covered include: process models (CRISP-DM and SEMMA), exploratory data analysis (univariate and bivariate), dimensionality reduction (feature selection, principal components analysis), descriptive modelling (cluster analysis and association rules), predictive modelling (decision trees, neural networks, K-nearest neighbour, Naive Bayes, ensemble models), statistical modelling (linear and logistic regression) and text mining. It is assumed that students have a basic knowledge of Statistics. It is also highly recommended that students do COS 710 and COS 711, as knowledge of the content of these modules is assumed.