Posted on June 21, 2022
Dr Vukosi Marivate, an associate professor in the University of Pretoria’s Department of Computer Science and the Absa Chair of Data Science in the Faculty of Engineering, Built Environment and Information Technology, is a recipient of the Google AI Research Scholar Award for 2022.
This award supports early-career researchers who are pursuing cutting-edge research in fields relevant to Google. This includes machine learning and data mining, and machine translation, among other computer science-related fields. It will provide financial support to the investigation of Dr Marivate’s research team into consolidating learnings of language models and language tools for South African languages and beyond.
According to Dr Marivate, recent advances in natural language processing (NLP) have only benefitted well-represented languages, negating research into lesser-known global languages. This is, in part, due to the availability of curated data and research resources, as well as NLP algorithms that can exploit this abundance of data. Languages with fewer resources have the double challenge of small amounts of data and algorithms that do not cater for this paucity of data.
“Over the last few years, there has been an increase in grassroots organisations involved in NLP in the Global South. They have brought with them renewed energy and a focus on low-resource languages,” says Dr Marivate. “We propose consolidating our work at UP, which has focused on creating NLP resources and new tools for South African languages. Our focus is on exploring approaches for efficient and effective language models and tools for South African languages.”
Questions that Dr Marivate aims to address include the following:
In his research team’s prior work in this area over the last five years, they have investigated ways to improve the tools and resources available for resource-poor languages. “Given our location, we focused on South African languages as a base for our research. We have investigated augmentation methodologies for short text, developing word embeddings to assist with augmentation methods for low-resource languages, and curated new word nets for South African languages and cross-lingual models,” he says.
The research has the following goals:
A challenge that is faced by researchers in African languages is having a full end-to-end guide on how to look at an NLP task, curate the correct data, make a choice on the best models, and train and evaluate the models. “To this end, our work aims to document and create a reusable template for tackling low-resource language tasks through an African language lens.” Dr Marivate’s team will focus on nine South African languages and three NLP tasks (news and document classification, named entity recognition and translation).
Copyright © University of Pretoria 2024. All rights reserved.
Get Social With Us
Download the UP Mobile App