Google has announced WAXAL, a large-scale open speech dataset built specifically for African languages, aimed at unlocking a new wave of AI tools that can actually listen to and speak the languages used across the continent. Developed in partnership with African universities and community organisations, the project brings together thousands of hours of real speech data to support research, innovation, and locally built technology.
At its core, WAXAL tackles a long-standing problem in artificial intelligence: African languages have been missing from the data that trains modern AI systems. What that has meant, and why this launch matters becomes clearer when you look at how speech technology is built, who it has historically served, and what changes when African researchers finally have the infrastructure they need.
What Google Just Announced
Today February 2, 2026, Google, alongside a group of African universities and research organisations, announced the launch of WAXAL. The goal is to make it easier for researchers, developers, and startups to build speech technology that actually understands how Africans speak.
WAXAL includes over 1,250 hours of transcribed speech data across 21 Sub-Saharan African languages, alongside more than 20 hours of studio-quality recordings meant for creating high-fidelity synthetic voices. In practical terms, this means AI systems can finally be trained to recognise, understand, and generate speech in languages that have long been ignored by mainstream technology.
The dataset is now publicly available, opening the door for anyone working on African speech technology to build without starting from scratch.
Why African Speech Data Has Been a Problem
Speech-based AI relies heavily on massive amounts of training data. Without recordings of people speaking a language, in different accents, speeds, and contexts, AI models struggle to perform basic tasks like transcription or voice recognition.
Africa is home to more than 2,000 languages, yet only a small fraction have had enough digital data to support modern AI development. This has meant that voice-enabled tools, from virtual assistants to automated customer service systems, often do not work in local languages or fail entirely.
WAXAL directly targets this problem by supplying the foundational data researchers need to build speech recognition systems, text-to-speech tools, and voice-driven applications tailored to African users.
The Languages Covered by WAXAL
The dataset spans a wide linguistic and geographic range. Languages included in the first release of WAXAL are:
Acholi, Akan, Dagaare, Dagbani, Dholuo, Ewe, Fante, Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Kikuyu, Lingala, Luganda, Malagasy, Masaaba, Nyankole, Rukiga, Shona, Soga (Lusoga), Swahili, and Yoruba.
This mix reflects both widely spoken languages and those that have traditionally received less technological attention, helping reduce long-standing digital inequality.
Built in Africa, Not Just For Africa
One of the most important aspects of WAXAL is how it was created. Rather than being collected remotely, the dataset was built over three years by African academic and community organisations working directly with local speakers.
Institutions such as Makerere University in Uganda, the University of Ghana, and Digital Umuganda in Rwanda led data collection efforts, with technical support from Google Research Africa. These institutions retain full ownership of the data, setting a precedent for more equitable AI development partnerships.
Google Research Africa’s Head, Aisha Walcott-Bryant, described WAXAL as scientific infrastructure rather than just a dataset. According to her, the project gives African students, researchers, and entrepreneurs the tools to build technology “on their own terms, in their own languages,” with the potential to reach more than 100 million people.
The Implication on Education, Health, and Business
The impact of WAXAL goes far beyond academic research. With reliable speech data, developers can begin building tools that support education in local languages, improve access to healthcare information, and create voice-enabled services for people who may not be fluent readers.
At the University of Ghana, the project has already involved over 7,000 volunteers who contributed their voices. According to Professor Isaac Wiafe, the dataset has helped train a new generation of AI researchers and sparked innovation in areas such as agriculture, education, and health technology.
Similarly, Joyce Nakatumba-Nabende from Makerere University noted that WAXAL has strengthened local research capacity in Uganda, supporting student-led and faculty-led projects focused on African speech technologies that reflect real community needs.
Why This Matters Now
As AI tools become more embedded in daily life, language access is quickly becoming a new form of digital inclusion. Without support for local languages, millions risk being excluded from services that rely on voice interaction, from banking to public services.
By releasing WAXAL as an open dataset, Google and its African partners are lowering the barrier to entry for innovation. Researchers no longer need to spend years collecting speech data before they can begin building useful tools. Startups can prototype faster, and universities can focus on advancing models rather than assembling basic resources.
What Comes Next
WAXAL is available starting today through the Google Africa blog, and its open nature means it can evolve over time as more researchers build on it. While it does not solve every challenge facing African AI development, it addresses one of the most fundamental ones: access to data.
For a continent rich in linguistic diversity, that foundation could be the difference between being an afterthought in AI development and becoming a driving force in shaping how intelligent systems understand human speech.