Dr Vukosi Marivate is the ABSA Chair of Data Science at the University of Pretoria and co-founder of the Deep Learning Indaba. He spoke to Primarashni Gower about his work.
Tell us about your background.
I hold a BSc and MSc in Electrical Engineering from Wits University (MSc under Professor Tshilidzi Marwala) and a PhD in Computer Science from Rutgers University, New Jersey, USA (PhD under Prof Michael Littman). I work on developing machine learning/artificial intelligence methods to extract insights from data. A large part of this work over the last few years has been in the intersection of machine learning and natural language processing (due to the abundance of text data and the need to extract insights). I run a research group called Data Science for Social Impact, using local challenges as a springboard for research. In this area, I have worked on projects in science, education, energy, public safety and utilities.
What is your role as ABSA Chair of Data Science at UP?
The chair is a collaborative industry chair created by the University and ABSA. I work to expend data science practice, research and community, inside and outside the University. Inside the University this means doing interdisciplinary data science research within and outside the Faculty of Engineering, Built Environment and Information Technology.
What exactly is machine learning and data science?
Machine learning is a subset of artificial intelligence that deals with developing machines that can learn patterns from data. Data science is a burgeoning field that looks at using data (small and large) to better understand our world and take on challenges across numerous fields (thus its multi-disciplinary nature). At the end, the data scientist works to get their methods to be used by people in the field who use these solutions to solve problems. At the University this means I get to work with many academics from different schools and faculties. Ultimately, we work to look at a problem through the lens of data and find ways to use appropriate modelling (machine learning, statistics and graph mining) to tackle that problem.
Why are they so important and what are they used for?
Data has become abundant in multiple ways. My research group looks at a number of problems. One problem is using text as a data source to build tools. For example, we are working on methods that can make it easier to build automated tools that can process local language data for tasks such as understanding communication on chat groups, automated labelling of local language data and discovering patterns in local language texts.
Through a number of collaborations inside and outside South Africa, we are working on looking at education data to better understand what factors lead to improved performance (on multiple measures) for primary and secondary school education. This is important as we can use machine learning models to predict performance, but for policymakers, we have to be able to explain how these methods actually work and how they make their decisions.
A number of students and collaborators work on developing machine learning approaches to understand cyber-safety challenges such as anomaly detection to detect fraud, or methods to identify threatening content online (misinformation, fake news, online harassment).
How much progress has South Africa made in terms of machine learning and data science, in comparison to Africa and the rest of the world?
We have a growing community that still has to grow by multiple factors to reach critical mass that solidifies the community. South Africa is one of the more advanced machine learning/data science communities on the African continent, but we still have to find ways to collaborate across institutions and with industry to create a solid foundation for sustainability. We face the challenge that we do not have large university departments with 50 or so full-time PhD students who are in computing, let alone machine learning/artificial intelligence/data science and this is something we must work towards. There is a lot of opportunity if we work in a distributed manner. Through the Deep Learning Indaba, we have connected many people who are in the field doing great work.
What advice would you give to prospective university students about this field?
There are so many opportunities in this area and if you keep on learning you can grow very quickly. The University has a number of opportunities for those interested in data science, including the Master’s in IT in Big Data Science that continues to attract many students. Those interested in our group’s research can visit our page by clicking here.