|02250193||Faculty of Natural and Agricultural Sciences||Department: Statistics|
|Minimum duration of study: 2 years||Total credits: 180||NQF level: 09|
The curriculum for the MSc (eScience) coursework degree programme comprises 180 credits of coursework and a research component. One of the key features of the curriculum is a capstone project that runs parallel with coursework modules in the first year of study. During the capstone project, students will go through the entire cycles of solving a real-world data science problem, collecting and processing real-world data, designing methods to solve the problem, and implementing a solution. The capstone project and coursework prepare the student for the mini-dissertation problem supervised by an expert.
The progress of all master’s candidates is monitored biannually by the supervisor and the postgraduate coordinator. A candidate’s study may be terminated if the progress is unsatisfactory or if the candidate is unable to finish his/her studies during the prescribed period.
Subject to exceptions approved by the Dean, on recommendation of the head of department, and where applicable, a student may not enter for the master’s examination in the same module more than twice.
Minimum credits: 90
Choose 4 modules to the value of 60 credits from the list of electives.
Scientific writing styles; layouts for assignments, projects, theses or publications; research methodologies; scientific assignments; integration of all the aforementioned content items for a capstone project in data science.
Technical processes of data collection, storage, exchange and access; Ethical aspects of data management; Legal and regulatory frameworks in South Africa and in relevant jurisdictions; Data policies; Data privacy; Data ownership; Legal liabilities of analytical decisions and discrimination; and the Technical and algorithmic approaches to enhance data privacy, and relevant case studies.
Introduction: Basic concepts. Supervised learning setup: Least means squares, logistic regression, perceptron, exponential family, generative learning algorithms, Gaussian discriminant analysis, naïve Bayes, support vector machines, model selection and feature selection. Learning theory: bias/variance tradeoff, union and Chernoff/Hoeffding bounds, VC dimension, worst case (online) learning. Unsupervised learning: clustering, k-means, expectation maximisation, mixture of Gaussians, factor analysis, principal components analysis, independent components analysis. Reinforcement learning and control: Markov decision processes, Bellman equations, value iteration and policy iteration, Q-learning, value function approximation, policy search, reinforce, partially observable Markov decision problems.
Data and image models; visualisation attributes (colour) and design (layout); exploratory data analysis; interactive data visualisation; multidimensional data; graphical perception; visualisation software (Python & R); and types of visualisation (animation, networks and text).
Introduction to scientific computing architectures in Python, introduction to distributed systems, introduction to distributed databases, introduction to parallelism, large-data computation and storage models, introduction to well-known distributed systems architectures, and programming large-data applications on open-source infrastructures for data processing and storage systems.
High-dimensional space, best-fit subspaces and singular value decomposition, random walks and Markov chains, statistical machine learning, clustering, random graphs, topic models, matrix factorisation, hidden Markov models, graphical models, wavelets, and sparse representations.
Specialised and applied concepts and trends in data science.
An understanding of multivariate statistics, hypothesis testing and confidence intervals. The ability to model data using well-known statistical distributions as well as the ability to handle data that is both continuous and categorical. The ability to perform statistical modelling including multivariate linear regression and adjust for multiple hypotheses. Forecasting, extrapolation, prediction and modelling using statistical methods. Bayesian statistics, an understanding of bootstrapping and Monte Carlo simulation.
Introduction to convex optimisation, subgradient methods, decomposition and distributed optimisation, proximal and operator splitting methods, conjugate gradients, and nonconvex problems.
Minimum credits: 90
This is the research component of the MSc (eScience) degree and comprises a mini-dissertation which develops the research skills and bridges the gap between theory and practice.
Copyright © University of Pretoria 2023. All rights reserved.
COVID-19 Corona Virus South African Resource Portal
To contact the University during the COVID-19 lockdown, please send an email to [email protected]
Get Social With Us
Download the UP Mobile App