New AI Tool Can Finally Tell African Languages Apart

May 3, 2026

A persistent and largely invisible problem in African artificial intelligence development has a new solution. Most AI systems cannot reliably identify which African language a piece of text is written in, a flaw that prevents those languages from being used to train more capable tools. A new open-source model released last week is designed to fix that foundational gap.

AI research company Pleias and the GSM Association (GSMA) launched CommonLingua on April 28, a language identification (LID) model that covers 334 languages including 61 African languages across eight language families. It is the first release under the GSMA’s “AI Language Models in Africa, by Africa, for Africa” initiative, a coalition focused on closing the language gap in AI development on the continent.

The problem CommonLingua addresses sits at the very start of the AI pipeline. Before a model in Swahili, Yoruba, or Wolof can be built, text in those languages must first be correctly classified by language. Existing identification tools including fastText, GlotLID, and OpenLID were built primarily around European and Asian languages and routinely misclassify African-language text as English or French. Even leading frontier AI models lose roughly 30 percentage points in accuracy on African languages compared to major world languages.

CommonLingua achieves 83 percent accuracy and an F1 macro score of 0.79 on the new CommonLID benchmark, outperforming leading language identification models by more than 10 percentage points under comparable conditions, while using roughly one three-hundredth of their parameters. The model ships as an 8 megabyte file and can process approximately 20 texts per second on a standard central processing unit (CPU) and up to 3,000 texts per second on a single graphics processing unit (GPU), making it practical for deployment in low-resource settings.

The 61 African languages covered span Bantu, Niger-Congo and West African, Afro-Asiatic and Semitic, Cushitic and Chadic, Berber, Nilo-Saharan, and pidgin and creole families. The model operates on raw text byte sequences rather than language-specific tokenizers, allowing it to handle multiple scripts including Latin, Arabic, Ethiopic, N’Ko, and Tifinagh consistently.

Pierre-Carl Langlais, Co-founder and Chief Technology Officer of Pleias, described language identification as a prerequisite for everything that follows in African AI development. Louis Powell, Director of AI Initiatives at GSMA, said the release addresses a foundational infrastructure gap that has held back progress for years, and that shared tools of this kind are essential to building AI systems that reflect Africa’s linguistic reality at scale.

The model is trained exclusively on open-licensed and public domain data and all datasets are released under permissive licences. The GSMA and partners plan to continue the conversation at MWC26 Kigali in June.

Send your news stories to [email protected] Follow News Ghana on Google News

NACOC Educates SOGASCO Students On Drug Abuse

Joana Quaye Sues RNAQ Over Alleged Share Fraud

Suaman Chiefs Endorse 24-Hour Market Project

Ghana Hosts Regional Postal Chief In Reform Push

Ghana Emerges as West Africa Innovation Hub, NEIP Credited

Karpowership Expands Fleet As Ghana Unit Wins Awards

MIIF Royalties Hit GH¢5.43bn Despite Cedi Gains

TIIP to drive value addition, investment and jobs – Courage Nunekpeku, TDC MD

Three Die In Mexico City World Cup Celebrations

Cape Verde Goalkeeper’s Instagram Surges Past 17 Million

MTN Rallies Behind Black Stars Ahead of FIFA World Cup Round of 32 Clash

Argentina Face Cape Verde as World Cup Last 32 Confirmed

Ghana Coach Queiroz Warns Expanded World Cup Losing Value

Meet X Lor: The Ghanaian Singer Redefining African Music Through ABL Ahead of His Tornado EP Launch

Elliot Page Debuts Ripped Physique Ahead Of Odyssey

Jury Orders Chris Brown To Pay Housekeeper US$13M

Accra Floods: Stonebwoy Calls for Better Drainage and Public Responsibility

Anabel Rose Turns Confidence Into A Dancefloor Anthem On “Blasé”

Three Legal Fault Lines That Could Shield Ofori-Atta From Extradition

Ghana’s Anti-Corruption Watchdog Faces Its Biggest Legal Test

MoMo Merger Done. Now Comes the Harder Part

Barker-Vormawor Pushes Back on Awuni Over African Role in Slave Trade

Lincoln University’s Reversal Carries Costs Beyond the Ceremony

New AI Tool Can Finally Tell African Languages Apart

LEAVE A REPLY Cancel reply

About us

Menu

The latest

Visa Outlines AI, Token, and Stablecoin Capabilities Shaping the Future of Commerce

Meet X Lor: The Ghanaian Singer Redefining African Music Through ABL Ahead of His Tornado EP Launch

Ghana Emerges as West Africa Innovation Hub, NEIP Credited

Subscribe