Ear to the ground: Using text mining to pick up all Sudanese voices for Radio Dabanga

Talk + Q&A | Friday 25th September, 10:10am – 10:30am CEST

Radio Dabanga is a radio station broadcasting for Sudan. However, due to Sudan’s repression of free press, Dabanga has to be based abroad. Its journalists are, therefore, largely dependent on Whatsapp-tips they receive from their Sudanese listeners. The journalistic verification process then has to be carried out with limited ‘on the ground’ sources in Sudan. As a result of this, and also because of the danger that is involved in verifying tips on government actions, it is paramount that Dabanga selects carefully which tips should be followed up on and which should not.

The problem is: per day Dabanga easily receives over 3.000 messages. With only three journalists, they have no means to read all of them, let alone assess them. They are agonising over the fact that they do not know which of those sources is reliable, and whether they might miss valuable information.

The Utrecht University of Applied Sciences has worked with Dabanga to build a solution. After extracting all the conversations through a WhatsApp-miner, we have implemented a short text topic modelling algorithm that structures the messages into groups and allows for quick evaluation of all of them. This helps to assess the topics, and to select the messages that merit a follow-up. For this we needed to devise a fine-mazed information structure that learns from new messages and topics.

In order to create this, we had to overcome problems related to the idiosyncrasies of various Arab vernaculars (NLP has not been tested copiously on Arab), and related to the extreme shortness of messages. In this case study, we would like to show how we dealt with these informational issues, including their ethical repercussions, and present the result: a functional set of analysis tools that effectively helps Radio Dabanga support democracy.

Aletta Smits


Erik Hekman


Koen van Turnhout