Crowd Workers Are an Integral Piece of the Ethical AI Puzzle — Part 1

15 Mar 2022

6 min read

Building ethical AI isn’t a one-and-done checkbox-marking exercise – it’s a continual process made up of many nuanced considerations and decisions. It’s therefore not only our responsibility to build AI models that are robust to bias, but also to train and maintain them with unbiased, representative data to begin with. That’s why we must be mindful of our data collection practices and in particular, the role of crowd sourcing platforms in collecting that data.

In part one of our series on crowd work and its place in the ethical AI life cycle, we explore what crowd sourcing means to AI development, how it has enabled powerful AI implementations to change our day-to-day lives while remaining largely obscure to the public, and why we must all make a concerted effort to change this.

Crowdsourcing and its place in our AI present and future

One of the promises of artificial intelligence is a future of full automation in which we’re freed from the demands of labor, whether it’s low cognitive load tasks or outright physical toil. However, the reality of developing and implementing AI to such a powerful and pervasive extent is – at least to AI and machine learning developers – still far off.

Until then, grim prognostications and alarming headlines of the coming AI apocalypse and mass unemployment are still, thankfully, the stuff of dystopian science fiction. The job market will shift with the steady introduction of automation for narrowly focused tasks where current technology can support it, but outside of disruption in a handful of key industries, there will always be a need for human labor in and around AI because they are technologies that need human experience and judgment to function in the first place.

We’re speaking here of roles beyond data scientists, machine learning engineers, and AI researchers of course, all of whom implement AI solutions and push forward development of the underlying algorithms and technologies that they’re built upon. What powers AI has been and always will be data, and as such, there will always be an urgent need for more of it – newer, more specialized, and more varied.

In general, humans already generate and collect massive amounts of data today, but not all of it is useful. As noted in previous posts about the lack of diversity in benchmark datasets and machine learning’s culture of state of the art, some (or a lot) of that data can be problematic due to poor quality or bias. That, coupled with the popularity of data-hungry deep learning models, is another reason we’ll continue to need newer, better data to drive current and future AI.. That, coupled with the popularity of data-hungry deep learning models, is another reason why we’ll continue to need newer, better data to drive current and future AI.

Enter crowdsourcing platforms, where a large, distributed groups of workers are given “microtasks” to help collect and annotate data intended for training or evaluating AI. On paper, it’s a clever solution to building the datasets quickly and easily given crowdsourcing’s ability to generate custom data on demand and at scale. However, it’s also an often overlooked if critical part of the AI lifecycle.

For too long, attention has been paid to the amazing capabilities of AI and their notable implementations – Alexa, Siri, Google Assistant, for example – rather than the great human effort necessary to make those implementations possible. This “invisibility” of crowd workers has resulted in systemic issues in pay, precarious employment, and scant worker rights, which in turn results in negative consequences for AI itself, chief amongst which are AI outputs that are inadequate at best and harmful to human rights at worst.

As such, if overlooking crowd work endangers the overall AI endeavor, it is a fault that firmly places crowdsourcing in the ethical AI discussion: specifically, if we aren’t sourcing our data ethically, can any of the AI we build ever be ethical?

It’s a concern that Defined.ai, since its inception, has endeavored to correct by way of our own crowd platform, Neevo, both to be the change we want to see in the AI world and to set the standard with our approach to ethical AI and ethical data collection. However, it’ll take more than just us to, as Kittur et al pose in The Future of Crowd Work, “foresee a future crowd workplace in which we would want our children to participate.”

In the days ahead, we’ll look at some of the ways in which the neglect of crowd work has harmed advancements in AI and what we can do, collectively, to fix them and realize our ethical AI future.