Zoom goes for a blatant genAI data grab; enterprises, beware

August 11, 2023 admin analytics, Data Privacy, generative ai, videoconferencing, zoom video communications

Credit to Author: eschuman@thecontentfirm.com| Date: Fri, 11 Aug 2023 11:21:00 -0700

When Zoom amended its terms of service earlier this month — a bid to make executives comfortable that it wouldn’t use Zoom data to train generative AI models — it quickly stirred up a hornet’s nest. So the company “revised” the terms of service, and left in place ways it can still get full access to user data.

(Computerworld repeatedly reached out to Zoom without success to clarify what the changes really mean.)

Before I delve into the legalese — and Zoom’s weasel words to falsely suggest it was not doing what it obviously was doing — let me raise a more critical question: Is there anyone in the video-call business not doing this? Microsoft? Google? Those are two firms that never met a dataset that they didn’t love.

One of the big problems with generative AI training is that gen AI cannot be predicted. It’s prone to “hallucinations” and despite the widely-held belief that it will get better and more accurate via various updates over time, the opposite has happened. OpenAI’s ChatGPT accuracy has plummeted in the most recent release.

Once data goes in, there’s no telling where it will come out. Amazon learned that lesson earlier this year when it noticed ChatGPT revealing sensitive internal Amazon data in answers. Amazon engineers were testing ChatGPT by feeding it internal data and asking it to analyze that data. It analyzed it all right, then learned from it — and then felt free to share what it learned with everyone everywhere.

With that scenario in mind, consider the typical Zoom call. Enterprises use it for internal meetings where the most sensitive plans and problems are discussed in detail. Physicians use it for patient discussions.

This is what Zoom says in its revised terms of service:

“Customer Content does not include any telemetry data, product usage data, diagnostic data, and similar content or data that Zoom collects or generates in connection with your or your End Users’ use of the Services or Software. As between you and Zoom, all right, title, and interest in and to Service Generated Data, and all Proprietary Rights therein, belong to and are retained solely by Zoom. You agree that Zoom compiles and may compile Service Generated Data based on Customer Content and use of the Services and Software. You consent to Zoom’s access, use, collection, creation, modification, distribution, processing, sharing, maintenance, and storage of Service Generated Data for any purpose, to the extent and in the manner permitted under applicable Law, including for the purpose of product and service development, marketing, analytics, quality assurance, machine learning or artificial intelligence (including for the purposes of training and tuning of algorithms and models), training, testing, improvement of the Services, Software, or Zoom’s other products, services, and software, or any combination thereof, and as otherwise provided in this Agreement.”

Unless I missed it, the Zoom lawyers apparently forgot to include the full rights to your firstborn. (They’ll get to it.) They then added that:

“Zoom may redistribute, publish, import, access, use, store, transmit, review, disclose, preserve, extract, modify, reproduce, share, use, display, copy, distribute, translate, transcribe, create derivative works, and process Customer Content: You agree to grant and hereby grant Zoom a perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable license and all other rights required or necessary to redistribute, publish, import, access, use, store, transmit, review, disclose, preserve, extract, modify, reproduce, share, use, display, copy, distribute, translate, transcribe, create derivative works, and process Customer Content and to perform all acts with respect to the Customer Content as may be necessary for Zoom to provide the Services to you, including to support the Services; (ii) for the purpose of product and service development, marketing, analytics, quality assurance, machine learning, artificial intelligence, training, testing, improvement of the Services, Software, or Zoom’s other products, services, and software, or any combination thereof; and (iii) for any other purpose relating to any use or other act permitted in accordance with Section 10.3.”

OK. And then for laughs, they typed in: “Notwithstanding the above, Zoom will not use audio, video or chat Customer Content to train our artificial intelligence models without your consent.” Really? Had they deleted the earlier words, then maybe this would be legitimate.

There are two loopholes here. “Without your consent.” Based on all of the above, such consent is granted by merely using the product. I repeatedly asked Zoom to point out where on the site (or in the app) users could go to withdraw consent for any AI training. No answer. They do offer such consent withdrawal for a few highly limited services, such as summarizing meeting notes. But overall? Not so much.

The consent mechanism of using the product is particularly troublesome for non-customers. Let’s say an enterprise pays for and hosts a call and then invites customers, some contractors and other partners to participate in the meeting. Do those guests understand that anything they say might be fed to generative AI? Other than refusing to attend, how is a guest supposed to decline consent?

The other loophole involves the word “content.” As Zoom describes it, there is a lot of metadata and other information it gathers that it does not strictly consider content. Zoom discussed this in a blog post: “There is certain information about how our customers in the aggregate use our product — telemetry, diagnostic data, etc. This is commonly known as service generated data. We wanted to be transparent that we consider this to be our data.”

The pushback on this data-grab may be pointless. Zoom isn’t backing off and until rivals take an explicit stance about this kind of generative AI training, this will happen again and again.

Kathleen Mullin, a veteran CISO (including Tampa International Airport and Cancer Treatment Centers of America) who now performs fractional CISO work, said she doubts Microsoft would do the same thing Zoom is trying.

Microsoft “is the originator of a lot of LLM anyway, so I don’t know that they need the data from Teams,” Mullin said. That’s a fair point, but many enterprises have historically never let “we don’t need that data” from stopping them from using some data.

Scott Castle, who served for four years as the chief strategy officer with AI firm Sisense before leaving that company in July, said he found the Zoom efforts discomforting. “CIOs are not paying that much attention” to how the data from partners can be used, he said. “They are just trying to get a couple of years ahead of the market.

“The problem here is that it is the user who created the underlying data and Zoom is saying, ‘If you use (our service), we want a piece of that action.’ It’s overreach in a way that tries to cut off the conversation [about] who the value creator is: ‘You still own your content but we own everything about your content.’ I think it is trying to partition stuff into yours and mine in a way that is deeply ingenuous. ‘You nominally own the valueless thing you created, but I own everything else, including the pixels and all of the intrinsic information in that image.’

“And what if Zoom later goes out of business? Where does that data go?”

Data analytics expert Pam Baker — author of Data Divination: Big Data Strategies — saw the Zoom move as potentially even more dangerous.

“Zoom’s new AI scraping policy — with no way to opt out — is a symptom of a much larger problem,” Baker said. “We are seeing the most expansive data harvesting effort ever, all in the name of training AI on every moment, every aspect, every thought and action, and every idea that people have — not to mention the harvesting of intellectual property, copyrighted works and proprietary information. This is what movements like Responsible AI are supposed to stop, but if laws aren’t enacted fast to prevent the reaping, privacy will already be dead.”

http://www.computerworld.com/category/security/index.rss