Building Inclusive Speech Technology with Diverse Data
Inclusive speech recognition technology that is trained on diverse, accented speech data is the key to staying relevant in the voice recognition market.
Three New Yorkers walk into a bar: one grew up in the Midwest in a Mexican family, another is a native Spanish speaker from Colombia, and the last, a New Yorker who spoke Castilian Spanish at home until high school. There’s no punchline here: they simply sit down and have a conversation in English.
As they speak, we observe major differences in the speech of each person. Geography, socio- economic status, and ethnicity, among other factors, all cause variations in pronunciation, vocabulary, and other speech patterns.
Given those differences, what happens when each of them goes home to their voice assistant and makes a request in English? How well is each of their accents understood? And what are the consequences for those who aren’t understood?
Those are essential questions for data scientists, developers, and other AI speech professionals as they work to create speech recognition technology that is inclusive, diverse, and free from biases caused by an accent gap.
Bridging the accent gap
An accent gap is a type of algorithmic bias that occurs in voice recognition models that lack training with diverse, representative data, for example, models trained exclusively on English speech data sourced from a single geographic and cultural background. This “accent gap” can be frustrating to users who fall outside the narrow definition of an English speaker (predominantly white, upper-class male speakers), resulting in a product that doesn’t meet the needs of a diverse market.
An accent gap can affect speech technology of all kinds. For example, one Washington Post study found that Amazon’s Alexa was 30% less likely to understand non-native English accents. In the same study, voice assistants from Google and other major competitors produced similar results.
This means that to compete long-term in the voice recognition market and to create inclusive speech products, your model must understand accented speech. And when we say “models” we don’t just mean voice assistants. All models and devices that make up the Internet of Things (IoT), many of which use voice activation and recognition as part of their core offering, should be trained on diverse, representative, and bias-aware training data.
By releasing a free Spanish-accented English speech dataset, Defined.ai aims to help AI professionals test whether their models present accent gap for one specific group: non-native English speakers in the US whose native language is Spanish.
Within the United States, there are more than 37 million Spanish speakers, making it the most spoken non-English language in the US. This number has grown by 233% since 1980, mostly due to immigration and the organic population growth in certain regions of the US.
Spanish itself has many variations – there are approximately 577 million native Spanish speakers in the world, spread across 21 countries, each with their own distinct accents.
As a result, addressing the accent gap in relation to Spanish accents is extremely complex and nuanced. Models must be trained on accented English taken from Spanish speakers all around the world, from a variety of Spanish-speaking countries.
The importance of inclusive speech technology
The absolutely essential element of needing to build inclusive and representative speech technology cannot be overstated. Beyond simply appealing to a broader customer base, the implications of inherently biased technology to be put to everyday use are far-reaching. Imagine, a user having to change the very way they speak in order to simply be understood by their home voice assistant, or in a customer service call.
These biases already exist in the real world, outside of technology, and it is our responsibility to create inclusive AI for the future that fights against these biases, rather than reinforce them.
Free speech data from Defined.ai
To continue the fight against this accent gap, Defined.ai is releasing free speech dataset, made up of data from Spanish-accented English speakers from all around the world.
Language is constantly evolving and shifting, adapting to its environment and its users. As a result, voice assistants and IVR models must evolve as well, to stay inclusive, relevant, and competitive.
Let’s build AI that drops the outdated model of what American English is supposed to sound like, and instead focuses on what it does sound like.
Claim your free dataset here, by registering here on Defined.ai’s marketplace.