Prof Vukosi Marivate, Absa Chair of Data Science in the Faculty of Engineering, Built Environment and Information Technology at the University of Pretoria, was the guest speaker at the 30th UP Expert Lecture, hosted by Vice-Chancellor and Principal, Prof Tawana Kupe, on 27 July 2022. The title of his lecture was: “Riendzo ri lehile: Tackling natural language processing (NLP) for African languages to make better sense of our world”.
The University of Pretoria’s Expert Lecture series provides a public platform for UP researchers to engage with a general audience on significant developments in their fields of expertise that are likely to have an impact in the future. Introducing the guest speaker, Prof Kupe expressed the privilege of having someone of Prof Marivate’s expertise associated with the University of Pretoria.
In addition to his position as research chair in the Department of Computer Science, Prof Marivate is co-founder of both the Deep Learning Indaba, an organisation that strives to strengthen the fields of machine learning (ML) and artificial intelligence (AI) across Africa, and the Masakhane NLP project, a grassroots organisation that encourages NLP research in African languages by Africans for Africans. He is also a recipient of the Google AI Research Scholar Award for 2022.
With research interests in ML and NLP, Prof Marivate works at the intersection of ML and NLP to extract insights from data. He is principal investigator in the Department of Computer Science’s Data Science for Social Impact Research Group, which conducts work on developing tools to improve the availability of data for local or low-resource languages.
In his presentation, Prof Marivate explained that low-resourced languages pose an interesting challenge for ML algorithms, representation, data collection and the accessibility of ML in general. For African languages, this challenge is even more consequential as it coincides with the challenges of shaping the current revolution in AI with the global landscape.
Focusing on the importance of language when it comes to telling stories, Prof Marivate presented the outcomes of his work over the last few years, which is focused on ensuring that African-language and local-language tasks count. This work covers new approaches in modelling, data collection and community building to create the perfect environment for creativity, innovation and archiving across South African languages and beyond.
Acknowledging that language is a rich interface to share information and interact with machines, he interrogated the questions of how machines process language information, and why local language is important.
He went on to explain what NLP entails and why it is important to develop tools to make data available to speakers of a range of languages, especially those with few resources. Machine learning and AI have been responsible for a number of NLP breakthroughs that are already in use, such as a cell phone’s virtual assistant, text-to-speech applications and chat bots.
The challenge of developing such tools for low-resource languages lies in the fact that such languages have a low availability of resources, and these resources are hard to discover. There is also a lack of scale and complexity. This needs to change.
His research group has embarked on several investigations to tackle this challenge. These include the creation, curation and classification of datasets for low-resource languages, such as Setswana and Sepedi in South Africa; improving short-text classification through global augmentation methods; and fine-tuning language models and embeddings in low-resource scenarios, using Setswana as a case study.
Prof Marivate believes that this research can really make a change to society by enabling speakers of African languages to make better sense of our world. “Africans should be able to shape and own technological advances towards human dignity, wellbeing and equity through inclusive community building, open participatory research and multidisciplinarity.”
Find out more about the work of the Data Science for Social Impact Research Group at https://dsfsi.github.io/