From the Philippines to Colombia, low-paid workers label training data for AI models used by the likes of Amazon, Facebook, Google, and Microsoft.
In 2016, Oskarina Fuentes got a tip from a friend that seemed too good to be true. Her life in Venezuela had become a struggle: Inflation had hit 800 percent under President Nicolás Maduro, and the 26-year-old Fuentes had no stable job and was balancing multiple side hustles to survive.
Her friend told her about Appen, an Australian data services company that was looking for crowdsourced workers to tag training data for artificial intelligence algorithms. Most internet users will have done some form of data labeling: identifying images of traffic lights and buses for online captchas. But the algorithms powering new bots that can pass legal exams, create fantastical imagery in seconds, or remove harmful content on social media are trained on datasets—images, video, and text—labeled by gig economy workers in some of the world’s cheapest labor markets.
Appen’s clients have included Amazon, Facebook, Google, and Microsoft, and the company’s 1 million contributors are just a part of a vast, hidden industry. The global data collection and labeling market was valued at $2.22 billion in 2022 and is expected to grow to $17.1 billion by 2030, according to consulting firm Grand View Research. As Venezuela slid into an economic catastrophe, many college-educated Venezuelans like Fuentes and her friends joined crowdsourcing platforms like Appen.