UP alumnus ranked among top 1% of data scientists in the world

Posted on July 08, 2021

Dr Hossein Masoumi Karakani, an alumnus of the University of Pretoria (UP), recently became the first person from UP to enter the global rankings in Kaggle, which is a live leaderboard of the best data scientists in the world.

Kaggle was founded in Melbourne, Australia, in 2010 and was acquired by Google in 2017. It is the world’s largest data science community and most well-known competition platform for predictive modelling and data analytics, with at least one million registered users across the globe.

Dr Karakani’s current rank is 509 of 162,813, and his best position in the rankings was 477, after he won his second consecutive silver medal in the Google Research Football Competition hosted by Kaggle. Dr Karakani finished 55th out of 1 138 teams, including single-member “teams” such as his. He also won a silver medal in a 2020 Kaggle competition (University of Liverpool: Ion Switching), finishing in 129th place out of 2 618 teams registered globally.

This particular artificial intelligence (AI) research football competition was hosted by Kaggle in collaboration with Google Research and Manchester City Football Club. Competitors were tasked with creating AI agents that can play football – the most popular sport in the world with an estimated following of four billion fans – and compete in steps where agents react to a game state.

“I’ve always dreamed about combining my data science knowledge with my passion for football, and winning this prestigious award turned my dream into reality,” said Dr Karakani, who holds a PhD in Mathematical Statistics.

He now ranks among the top 1% of data scientists globally, and has officially entered the Kaggle rankings with this second silver medal. Currently, there are 6 682 competition experts in Kaggle, but only about 40 ranked Kaggle experts in Africa and around 20 in South Africa. Dr Karakani also ranks third among Kaggle users from Iran.

“Unfortunately, the number of data scientists is not equally distributed around the world,” he said. “Since Kaggle is well known among the data science community, it can be used as a proxy to understand the gap between countries and continents. These competitions help push the boundaries of a user’s abilities, and competitors learn and master the art of critical thinking.

“The ranked users’ dataset was obtained by scraping the location data off Kaggle’s website. North America, Asia and Europe are responsible for 75% of all ranked users, and Africa, Oceania, South America and Central America are responsible for only 5%. Also, North America and Europe have the most experienced users in machine learning (ML), and Africa has about 2% of ML veterans. My next step is to facilitate knowledge transfer to the younger generations in the data science field.”

Dr Karakani also commented on the role that ranked data scientists can play in addressing many of the continent’s challenges, and UP’s role in educating the next generation of experts in this field.

“Although access to electricity, computers and the internet vary from continent to continent, there isn’t a massive gap regarding the number and types of cloud services and ML frameworks. It seems that access to these technologies is quite democratised, at least among Kaggle users. There is a bright picture regarding Africa and, most specifically, South Africa.

“Nonetheless, Africa still faces many difficulties when looking at essential socioeconomic features like formal education, years working with ML and yearly compensation. The greatest challenge that we face is the improvement and nurturing of our communities. The Department of Statistics at UP plays a vital role in identifying and training the next generation of data scientists that will be ready for the fourth industrial revolution. The focus will be on teaching and practising statistics in interdisciplinary joint ventures with partners in industry and government. A solid educational background is usually required to develop the depth of knowledge in the data science field. The core of ML is centred on statistics, and statistics are the soul of data science.”

