{"id":17545,"date":"2020-01-25T10:45:07","date_gmt":"2020-01-25T18:45:07","guid":{"rendered":"http:\/\/www.palada.net\/index.php\/2020\/01\/25\/news-11280\/"},"modified":"2020-01-25T10:45:07","modified_gmt":"2020-01-25T18:45:07","slug":"news-11280","status":"publish","type":"post","link":"http:\/\/www.palada.net\/index.php\/2020\/01\/25\/news-11280\/","title":{"rendered":"Scraping the Web Is a Powerful Tool. Clearview AI Abused It"},"content":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/media.wired.com\/photos\/5e2a1bb2123c60000827363b\/master\/pass\/ai-scraping-88622242.jpg\"\/><\/p>\n<p><strong>Credit to Author: Louise Matsakis| Date: Sat, 25 Jan 2020 12:00:00 +0000<\/strong><\/p>\n<p class=\"byline bylines__byline byline--author\" itemprop=\"author\" itemtype=\"http:\/\/schema.org\/Person\"><span itemprop=\"name\"><span class=\"byline__name byline--with-bg\"><a class=\"byline__name-link\" href=\"\/contributor\/louise-matsakis\">Louise Matsaki<span class=\"link__last-letter-spacing\">s<\/span><\/a><\/span> <\/span><\/p>\n<p class=\"content-header__row content-header__dek\">The facial recognition startup claims it collected billions of photos from sites like Facebook and Twitter. What does the practice mean for the open web?<\/p>\n<p>The internet was designed to make information free and easy for anyone to access. But as the amount of personal information online has grown, so too have the risks. Last weekend, a nightmare scenario for many privacy advocates arrived. <em><a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.nytimes.com\/2020\/01\/18\/technology\/clearview-privacy-facial-recognition.html?utm_source=Memberful&amp;utm_campaign=41977c2de4-daily_update_2020_01_21&amp;utm_medium=email&amp;utm_term=0_d4c7fece27-41977c2de4-111021813&quot;}\" href=\"https:\/\/www.nytimes.com\/2020\/01\/18\/technology\/clearview-privacy-facial-recognition.html?utm_source=Memberful&amp;utm_campaign=41977c2de4-daily_update_2020_01_21&amp;utm_medium=email&amp;utm_term=0_d4c7fece27-41977c2de4-111021813\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">The New York Times<\/a><\/em> revealed Clearview AI, a secretive surveillance company, was selling a facial recognition tool to law enforcement powered by \u201cthree billion images\u201d culled from the open web. Cops have long had access to similar technology, but what makes Clearview different is where it obtained its data. The company scraped pictures from millions of public sites including Facebook, YouTube, and Venmo, according to the <em>Times<\/em>.<\/p>\n<p>To use the tool, cops simply upload an image of a suspect, and Clearview spits back photos of them and links to where they were posted. The company has made it easy to instantly connect a person to their online footprint\u2014the very capability many people have long feared someone would possess. (Clearview\u2019s claims should be taken with a grain of salt; a <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.buzzfeednews.com\/article\/ryanmac\/clearview-ai-nypd-facial-recognition&quot;}\" href=\"https:\/\/www.buzzfeednews.com\/article\/ryanmac\/clearview-ai-nypd-facial-recognition\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">Buzzfeed News<\/a> investigation found its marketing materials appear to contain exaggerations and lies. The company did not immediately return a request for comment.)<\/p>\n<p>Like almost any tool, scraping can be used for noble or nefarious purposes. Without it, we wouldn\u2019t have the Internet Archive\u2019s invaluable <a href=\"https:\/\/www.wired.com\/story\/internet-archive-wikipedia-more-reliable\/\">WayBack Machine<\/a>, for instance. But it\u2019s also how Stanford researchers a few years ago built a <a href=\"https:\/\/www.wired.com\/story\/ai-research-is-in-desperate-need-of-an-ethical-watchdog\/\">widely condemned<\/a> \u201cgaydar,\u201d an algorithm they claimed could detect a person\u2019s sexuality by looking at their face. \u201cIt\u2019s a fundamental thing that we rely on every day, a lot of people without realizing, because it\u2019s going on behind the scenes,\u201d says Jamie Lee Williams, a staff attorney at the Electronic Frontier Foundation on the civil liberties team. The EFF and other digital rights groups have often argued the benefits of scraping outweigh the harms.<\/p>\n<p>Automated scraping violates the policies of sites like <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.facebook.com\/apps\/site_scraping_tos_terms.php&quot;}\" href=\"https:\/\/www.facebook.com\/apps\/site_scraping_tos_terms.php\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">Facebook<\/a> and <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/twitter.com\/en\/tos&quot;}\" href=\"https:\/\/twitter.com\/en\/tos\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">Twitter<\/a>, the latter of which specifically <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/developer.twitter.com\/en\/developer-terms\/more-on-restricted-use-cases&quot;}\" href=\"https:\/\/developer.twitter.com\/en\/developer-terms\/more-on-restricted-use-cases\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">prohibits scraping<\/a> to build facial recognition databases. Twitter <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.nytimes.com\/2020\/01\/22\/technology\/clearview-ai-twitter-letter.html&quot;}\" href=\"https:\/\/www.nytimes.com\/2020\/01\/22\/technology\/clearview-ai-twitter-letter.html\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">sent a letter<\/a> to Clearview this week asking it to stop pilfering data from the site \u201cfor any reason,\u201d and Facebook is also reportedly examining the matter, according to the <em>Times<\/em>. But it\u2019s unclear whether they have any legal recourse in the current system.<\/p>\n<p>To fight back against scraping, companies have often used the <a href=\"https:\/\/www.wired.com\/2014\/11\/hacker-lexicon-computer-fraud-abuse-act\/\">Computer Fraud and Abuse Act<\/a>, claiming the practice amounts to accessing a computer without proper authorization. Last year, however, the Ninth Circuit Court of Appeals <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.vice.com\/en_us\/article\/9kek83\/linkedin-data-scraping-lawsuit-shot-down&quot;}\" href=\"https:\/\/www.vice.com\/en_us\/article\/9kek83\/linkedin-data-scraping-lawsuit-shot-down\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">ruled<\/a> that automated scraping doesn\u2019t violate the CFAA. In that case, LinkedIn sued and lost against a company called HiQ, which scraped public LinkedIn profiles in bulk and combined them with other information into a database for employers. The <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.eff.org\/deeplinks\/2019\/09\/victory-ruling-hiq-v-linkedin-protects-scraping-public-data&quot;}\" href=\"https:\/\/www.eff.org\/deeplinks\/2019\/09\/victory-ruling-hiq-v-linkedin-protects-scraping-public-data\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">EFF<\/a> and other groups heralded the ruling as a victory, because it limited the scope of the CFAA\u2014which they argue has frequently been abused by companies\u2014and helped protect researchers who break terms of service agreements in the name of freedom of information.<\/p>\n<p>The CFFA is one of few options available to companies who want to stop scrapers, which is part of the problem. \u201cIt\u2019s a 1986, pre-internet statute,\u201d says WIlliams. \u201cIf that\u2019s the best we can do to protect our privacy with these very complicated, very modern problems, then I think we\u2019re screwed.\u201d<\/p>\n<p>Civil liberties groups and technology companies both have been <a href=\"https:\/\/www.wired.com\/story\/congress-privacy-bill-copra\/\">calling for a federal law<\/a> that would establish Americans\u2019 right to privacy in the digital era. Clearview, and companies like it, make the matter that much more urgent.  \u201cWe need a comprehensive privacy statute that covers biometric data,\u201d says Williams.<\/p>\n<p>Right now, there\u2019s only a patchwork of state regulations that potentially provide those kinds of protections. The <a href=\"https:\/\/www.wired.com\/story\/ccpa-guide-california-privacy-law-takes-effect\/\">California Consumer Privacy Act<\/a>, which went into effect this month, gives state residents the right to ask companies like Clearview to delete data it collects about them. Other <a href=\"https:\/\/www.wired.com\/story\/facial-recognition-laws-are-literally-all-over-the-map\/\">regulations<\/a>, like the Illinois Biometric Information Privacy Act, require corporations to obtain consent before collecting biometric data, including faces. A <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.courthousenews.com\/wp-content\/uploads\/2020\/01\/Surveillance-1.pdf&quot;}\" href=\"https:\/\/www.courthousenews.com\/wp-content\/uploads\/2020\/01\/Surveillance-1.pdf\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">class action lawsuit<\/a> filed earlier this week accuses Clearview of violating that law. Texas and Washington have similar regulations on the books, but don\u2019t allow for private lawsuits; California\u2019s law also doesn\u2019t allow for private right of action.<\/p>\n<p>Some experts argue that empowering consumers is not enough. \u201cWe just can\u2019t be expected to manage every use of our data online,\u201d says Dylan Gilbert, a privacy lawyer at the civil liberties group Public Knowledge. He argues the solution instead is to make some uses of personal data illegal. For example, some cities, including <a href=\"https:\/\/www.wired.com\/story\/san-francisco-bans-use-facial-recognition-tech\/\">San Francisco<\/a>, have banned facial recognition by city agencies all together.<\/p>\n<p>Another option is to give some power to organizations, rather than only individuals. \u201cCompanies and platforms like LinkedIn or Facebook or Twitter should have the right to protect their users\u2019 privacy downstream,\u201d says Tiffany C. Li, a technology lawyer and visiting professor at Boston University School of Law. A federal law could allow online platforms to sue entities like Clearview on behalf of their users to protect their right to privacy. The risk, though, is that corporations will pursue litigation that mostly serves their own interests. A <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/scholarship.law.bu.edu\/faculty_scholarship\/465\/&quot;}\" href=\"https:\/\/scholarship.law.bu.edu\/faculty_scholarship\/465\/\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">2018 article<\/a> in Boston University\u2019s <em>Journal of Science &amp; Technology Law<\/em> found that in 20 years of scraping cases based on the CFAA, \u201ca tremendous number\u201d concerned claims brought by \u201cby direct commercial competitors or companies in closely adjacent markets to each other.\u201d<\/p>\n<p>In the absence of legal recourse, one way companies have blocked people from scraping their sites is by using technical tools. Facebook has been particularly aggressive in this regard. It requires users to sign in to view almost anything on its site, and it uses a lengthy <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/support.google.com\/webmasters\/answer\/6062608?hl=en&quot;}\" href=\"https:\/\/support.google.com\/webmasters\/answer\/6062608?hl=en\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">robots.txt<\/a> file to stop Google from indexing many of its pages. That\u2019s why if you Google your name, all of your Facebook activity likely isn\u2019t in the search results. But not all of the social network\u2019s efforts have been popular. Last year, the company <a class=\"external-link\" data-event-click=\"{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https:\/\/www.theverge.com\/2019\/1\/28\/18201361\/facebook-political-ad-transparency-tools-blocked-user-data-privacy&quot;}\" href=\"https:\/\/www.theverge.com\/2019\/1\/28\/18201361\/facebook-political-ad-transparency-tools-blocked-user-data-privacy\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">blocked<\/a> third-party transparency tools used by nonprofits and journalists, because it said it needed to prevent malicious actors from scraping its site.<\/p>\n<p>Not all companies have the resources, or priorities, to create those kinds of barriers against any would-be scrapers. Venmo, the payment app owned by PayPal, has <a href=\"https:\/\/www.wired.com\/story\/venmo-alternatives\/\">repeatedly been criticized<\/a> for making all transactions public by default. Several researchers and artists have scraped <a href=\"https:\/\/www.wired.com\/story\/i-scraped-millions-of-venmo-payments-your-data-is-at-risk\/\">millions of payments<\/a> from Venmo to demonstrate how it puts people\u2019s privacy at risk. Clearview says it also mined the site for its database. \u201cScraping Venmo is a violation of our terms of service and we actively work to limit and block activity that violates these policies,\u201d a spokesperson said in a statement. While the app, and others like it, could do more to protect users, catching malicious scraping will always be an evolving cat-and-mouse game, and regulatory action could be more effective to stop it.<\/p>\n<p>\u201cWe don\u2019t want to limit access to information and we don\u2019t want to ban web scraping,\u201d Li says. \u201cBut we need to think about other ways to prevent some of the privacy harms we saw with Clearview.\u201d<\/p>\n<p><a href=\"https:\/\/www.wired.com\/story\/clearview-ai-scraping-web\" target=\"bwo\" >https:\/\/www.wired.com\/category\/security\/feed\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/media.wired.com\/photos\/5e2a1bb2123c60000827363b\/master\/pass\/ai-scraping-88622242.jpg\"\/><\/p>\n<p><strong>Credit to Author: Louise Matsakis| Date: Sat, 25 Jan 2020 12:00:00 +0000<\/strong><\/p>\n<p>The facial recognition startup claims it collected billions of photos from sites like Facebook and Twitter. What does the practice mean for the open web?<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[10378,10607],"tags":[714,21382],"class_list":["post-17545","post","type-post","status-publish","format-standard","hentry","category-security","category-wired","tag-security","tag-security-privacy"],"_links":{"self":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/17545","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/comments?post=17545"}],"version-history":[{"count":0,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/17545\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/media?parent=17545"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/categories?post=17545"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/tags?post=17545"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}