A Roomba captured a woman using the toilet. How did social media get these screenshots?
This episode reveals the inside story of an MIT Technology Review investigation into sensitive photos that were taken with an AI-powered vacuum. The leakage was then spotted and landed online.
Reporting:
- A Roomba captured a woman using the toilet. How did Facebook get these screenshots?
- After intimate photos were posted on Facebook, Roomba testers felt misled
We meet:
- Eileen Guo, MIT Technology Review
- Albert Fox Cahn, Surveillance Technology Oversight Project
Credits:
Eileen Guo reported on the episode and Anthony Green produced it. It was hosted and edited by Amanda Silverman, Mat Honan, and Jennifer Strong. The show was mixed by Garret lang with original music from Jacob Gorski and Garret Lang. Artwork by Stephanie Arnett.
Full transcript:
[TR ID]
Jennifer: Artificial intelligence is becoming more common in products. Companies need data to train these systems.
We don't always know where this data is from.
Sometimes, however, just by using a product, companies give their consent to use data for improving its products or services.
Imagine a device installed in a home. It requires one consent to set it up on behalf of each person who enters… so anyone living there, or just visiting, might be unknowingly recorded.
This episode is hosted by Jennifer Strong. It features a Tech Review investigation into training data…that was leaked from inside homes all over the globe.
[SHOW ID]
Jennifer: Last Year, someone reached out to me… and flagged some disturbing photos that were being circulated on the internet.
Eileen Guo These were, in essence, photos taken inside homes and sometimes included animals or people that didn't seem to be aware that they were being photographed.
Jennifer This investigative reporter Eileen Guo is
Based on her observations, she believed that the photos were taken by an AI-powered vacuum.
Eileen Guo These photos looked like they were taken at ground level and pointed up so you could see entire rooms, ceilings, whatever was in them…
Jennifer: She set out to investigate. It took several months.
Eileen GuoWe first had to verify that they were robot vacuums as we suspected. Then, we had to determine which robot vacuum it was from. We found that the Roomba was made by iRobot which is the largest robot vacuum manufacturer.
Jennifer: This raised questions about whether these photos were taken with consent…and how they ended up on the internet.
One of the pictures shows a woman sitting on a toilet.
Our colleague investigated it and found that the images were not of customers. They were Roomba employees… people the company calls "paid data collectors".
The photos show that the people in the photos are beta testers… but it was not clear what this meant.
Eileen Guo They're not quite as clear about the purpose of data, who it's being shared to and what other protocols or procedures will keep them safe.
Jennifer: She doubts that the people who granted permission to be recorded knew what they were agreeing to.
Eileen Guo While they understood that robot vacuums would take videos from their homes, they didn’t know that they would label them and be viewed by humans. They also didn’t know that they would be shared outside the country. No one realized that the images could be shared on Facebook or Discord. This is how they got to us.
Jennifer: This investigation revealed that these images were leaked from the gig economy by data labelers.
They were employed by Scale AI, a data labeling firm (hired in part by iRobot).
Eileen Guo These are very low-paid workers who are asked to label images in order to teach artificial intelligence to recognize what they see. Given their sensitive nature, it was a surprise that these images were posted on the internet.
Jennifer: Annotating images with the appropriate tags is known as data annotation.
This makes it easier to interpret and understand data in images, audio, or video.
It's used for everything, from flagging inappropriate content in social media to helping robot vacuums identify what's around them.
Eileen Guo The most useful datasets for training algorithms are the ones that are most realistic. This means it was sourced from real-world environments. To make machine learning work, all that data must be viewed and listened to by someone. They will categorize, label, and add context to each piece of data. It's an image of a street that says, "This is a yellow stoplight, this is the green stoplight." This is a stop sign.
Jennifer: There are many ways to label data.
Eileen Guo If iRobot wanted to, they could have chosen other models that would have had safer data. Although the data may have been outsourced, they could still work from their offices. Their work would be more organized. They could also have done data annotation in-house. Regardless of the reason, iRobot decided not to pursue either of these routes.
Jennifer: Tech Review contacted the company that makes the Roomba and they confirmed that the 15 images we were discussing did indeed come from their devices but from preproduction. These machines were not released to the public.
Eileen Guo They stated that they had begun an investigation into the leakage of these images. They said they had terminated their Scale AI contract and that they would take steps to prevent similar incidents in the future. They didn't explain what this meant.
Jennifer – These robot vacuums are capable of moving around the room efficiently and drawing maps of the areas to be cleaned.
They can also recognize objects on the ground and avoid them.
This is why machines can't drive through certain types of messes, like dog poop.
These leaked training images are different because the camera isn’t pointed at the ground.
Eileen Guo Why do these cameras point diagonally up? They don't know what's up on the ceilings or walls. How can they navigate around pet waste, phone cords, stray socks or other debris? This is in line with some of the larger goals that iRobot and other robot vacuum companies have for the future. It's to be able, based upon what's in your home, to identify what room it is in. All of this data will ultimately serve the larger goals of these companies, which are to create more robots for homes.
Jennifer, This data collection could be used to build new products.
Eileen Guo These images don't just focus on iRobot. These images are not only about testers. This is the entire data supply chain and this new point at which personal information can leak that consumers don't know about or think of. This is also alarming because as companies adopt artificial intelligence more data will be needed to train it. Where is this data coming from? This is a big question
Jennifer: In the US, companies don't have to disclose this…and privacy policies often include some version of a line allowing data from consumers to be used for product and service improvement… which includes AI training. We often opt in by simply using the product.
Eileen Guo It's not a matter of knowing that this is yet another place we should be concerned about privacy, either robot vacuums or Zoom, or any other data collection devices.
Jennifer: We expect to see more synthetic data in the future. This is data that hasn't been derived directly from humans.
She also said Dyson is using it.
Eileen Guo There's great hope that synthetic data will be the future. Because you don't have real-world data, synthetic data is easier to protect privacy. Early research suggests it to be just as accurate, if not better. However, most experts I spoke to believe that it is a long way off.
Jennifer: Links to our reporting can be found in the show notes… you can also support our journalism at tech review dot.com slash subscribe.
We'll be right back… right after that.
[MIDROLL]
Albert Fox Cahn This is another reminder that legislators and regulators are far behind in enacting privacy protections.
Albert Fox Cahn
Albert Fox Cahn is my name. I am the Executive Director for the Surveillance Technology Oversight Project.
Albert Fox CahnNow it's all wild west and companies are creating their own policies for ethical research and development. We can see why this is happening because a company has its employees sign ridiculous consent agreements that are completely unfair. They are, in my opinion, so bad that they might be unenforceable while the government takes a passive approach to what privacy protection should be in effect.
Jennifer He is an anti-surveillance attorney…a fellow at Yale as well as Harvard's Kennedy School.
He describes his work as constantly fighting against new ways data is taken from or used against people.
Albert Fox CahnThe terms we see here are meant to protect privacy and intellectual property of iRobot but have no protections for those who own these devices. It's frustrating to me that people are using these devices in places where it's almost certain that someone third-party will be videotaped. There's no consent provision from that third person. For every person living in the home and visiting that home, one person must sign off. Images may be taken from inside the home. You also have legal fictions like "Oh, I guarantee that no minor is recorded as part" We don't know if there is any provision that would prevent people from using these items in homes with children.
Jennifer: In the USA, it's anybody's guess what this data will look like.
Albert Fox Cahn This is a stark contrast to Europe's situation where there is comprehensive privacy legislation. There are active enforcement agencies and regulators who are constantly challenging the behavior of companies. You have active trade unions that would stop this type of testing regime with an employee most likely. It's like night and day.
Jennifer: It is problematic for employees to work as beta testers… because they may not feel they have a choice.
Albert Fox Cahn It's a fact that employees often don't have the right to consent in a meaningful way. Sometimes you can't say yes. Instead of volunteering, you will be asked to bring the product into your home and collect your data. This will create a coercive environment where you won't be able to consent meaningfully for this type of invasive testing program.
Jennifer Our devices already monitor data…from smartphones to washing machines.
As AI is integrated into more products and services, this trend will only increase.
Albert Fox Cahn
Evermore money is being spent on ever more intrusive tools that capture data from areas of our lives we used to consider sacred. There is a growing political backlash against surveillance capitalism and this type of corporate consolidation.
Jennifer: He believes that this pressure will lead to new privacy laws in the US. This is partly because the problem will get worse.
Albert Fox Cahn And when we consider the data labeling that goes over these recordings, we can see how many human beings have to go through them in order to make the material we need for machine learning. The army of people that can take this information, capture it, screen it and make it public is then possible. You know what, I don't believe companies who claim they can keep all data safe. There's always the possibility of harm, especially when dealing with products that are in their early design and training phases.
[CREDITS]
Jennifer This episode is reported by Eileen Guo. It was produced by Emma Cillekens, Anthony Green, and edited by Amanda Silverman, Mat Honan, and was edited by Mat Honan. It's also mixed by Garret lang, with original music by Jacob Gorski and Garret Lang.
Thank you for listening. I'm Jennifer Strong.
————————————————————————————————————————————————————————————
By: Anthony Green
Title: How Roomba tester’s private images ended up on Facebook
Sourced From: www.technologyreview.com/2023/01/26/1067317/podcast-roomba-irobot-robot-vacuums-artificial-intelligence-training-data-privacy-consent-agreement-misled/
Published Date: Thu, 26 Jan 2023 19:15:58 +0000
Leave a Reply