{"id":22413,"date":"2023-07-10T07:30:20","date_gmt":"2023-07-10T15:30:20","guid":{"rendered":"https:\/\/www.palada.net\/index.php\/2023\/07\/10\/news-16143\/"},"modified":"2023-07-10T07:30:20","modified_gmt":"2023-07-10T15:30:20","slug":"news-16143","status":"publish","type":"post","link":"https:\/\/www.palada.net\/index.php\/2023\/07\/10\/news-16143\/","title":{"rendered":"Voice deepfakes: technology, prospects, scams | Kaspersky official blog"},"content":{"rendered":"<p><strong>Credit to Author: Dmitry Anikin| Date: Mon, 10 Jul 2023 14:35:54 +0000<\/strong><\/p>\n<p>Have you ever wondered how we know who we&#8217;re talking to on the phone? It&#8217;s obviously more than just the name displayed on the screen. If we hear an unfamiliar voice when being called from a saved number, we know right away something&#8217;s wrong. To determine who we&#8217;re really talking to, we unconsciously note the timbre, manner and intonation of speech. But how reliable is our own hearing in the digital age of artificial intelligence? As the latest news shows, what we hear isn&#8217;t always worth trusting \u2013 because voices can be a fake: deepfake.<\/p>\n<h2>Help, I&#8217;m in trouble<\/h2>\n<p>In spring 2023, scammers in Arizona <a href=\"https:\/\/www.independent.co.uk\/tech\/ai-voice-clone-scam-kidnapping-b2319083.html\" target=\"_blank\" rel=\"nofollow noopener\">attempted to extort money<\/a> from a woman over the phone. She heard the voice of her 15-year-old daughter begging for help before an unknown man grabbed the phone and demanded a ransom, all while her daughter&#8217;s screams could still be heard in the background. The mother was positive that the voice was really her child&#8217;s. Fortunately, she found out fast that everything was fine with her daughter, leading her to realize that she was a victim of scammers.<\/p>\n<p>It can&#8217;t be 100% proven that the attackers used a deepfake to imitate the teenager&#8217;s voice. Maybe the scam was of a more traditional nature, with the call quality, unexpectedness of the situation, stress, and the mother&#8217;s imagination all playing their part to make her think she heard something she didn&#8217;t. But even if neural network technologies weren&#8217;t used in this case, deepfakes can and do indeed occur, and as their development continues they become increasingly convincing and more dangerous. To fight the exploitation of deepfake technology by criminals, we need to understand how it works.<\/p>\n<h2>What are deepfakes?<\/h2>\n<p>Deepfake (<em>&#8220;deep learning&#8221;<\/em> + <em>&#8220;fake&#8221;<\/em>) artificial intelligence has been growing at a rapid rate over the past few years. Machine learning can be used to create compelling fakes of images, video, or audio content. For example, neural networks can be used in photos and videos to replace one person&#8217;s face with another while preserving facial expressions and lighting. While initially these fakes were low quality and easy to spot, as the algorithms developed the results became so convincing that now it&#8217;s difficult to distinguish them from reality. In 2022, the world&#8217;s first <a href=\"https:\/\/www.youtube.com\/playlist?list=PLWTwWADrHvpkgv3cKyjomdfhESt5711OZ\" target=\"_blank\" rel=\"nofollow noopener\">deepfake TV show<\/a> was released in Russia, where deepfakes of Jason Statham, Margot Robbie, Keanu Reeves and Robert Pattinson play the main characters.<\/p>\n<div id=\"attachment_48592\" style=\"width: 2058px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2023\/07\/10095607\/audio-deepfake-technology-01.jpg\"><img loading=\"lazy\" aria-describedby=\"caption-attachment-48592\" decoding=\"async\" class=\"size-full wp-image-48592\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2023\/07\/10095607\/audio-deepfake-technology-01.jpg\" alt=\"Deepfake versions of Hollywood stars in the Russian TV series PMJason\" width=\"2048\" height=\"1278\" \/><\/a><\/p>\n<p id=\"caption-attachment-48592\" class=\"wp-caption-text\">Deepfake versions of Hollywood stars in the Russian TV series PMJason. (<a href=\"https:\/\/xn--h1aax.xn--p1ai\/news\/v-rossii-vyshel-pervyy-v-mire-dipfeyk-veb-serial-\/\" target=\"_blank\" rel=\"nofollow noopener\">Source<\/a>)<\/p>\n<\/div>\n<h2>Voice conversion<\/h2>\n<p>But today our focus is on the technology used for creating voice deepfakes. This is also known as voice conversion (or &#8220;voice cloning&#8221; if you&#8217;re creating a full digital copy of it). Voice conversion is based on autoencoders \u2013 a type of neural network that first compresses input data (part of the <u>en<\/u>coder) into a compact internal representation, and then learns to decompress it back from this representation (part of the <u>de<\/u>coder) to restore the original data. This way the model learns to present data in a compressed format while highlighting the most important information.<\/p>\n<div id=\"attachment_48593\" style=\"width: 2143px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2023\/07\/10095740\/audio-deepfake-technology-02.png\"><img loading=\"lazy\" aria-describedby=\"caption-attachment-48593\" decoding=\"async\" class=\"size-full wp-image-48593\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2023\/07\/10095740\/audio-deepfake-technology-02.png\" alt=\"Autoencoder scheme\" width=\"2133\" height=\"1600\" \/><\/a><\/p>\n<p id=\"caption-attachment-48593\" class=\"wp-caption-text\">Autoencoder scheme. (<a href=\"https:\/\/www.compthree.com\/blog\/autoencoder\/\" target=\"_blank\" rel=\"nofollow noopener\">Source<\/a>)<\/p>\n<\/div>\n<p>To make voice deepfakes, two audio recordings are fed into the model, with the voice from the second recording converted to the first. The content encoder is used to determine <strong>what<\/strong> was said from the first recording, and the speaker encoder is used to extract the main characteristics of the voice from the second recording \u2013 meaning <strong>how<\/strong> the second person talks. The compressed representations of <strong>what<\/strong> must be said and <strong>how<\/strong> it&#8217;s said are combined, and the result is generated using the decoder. Thus, what&#8217;s said in the first recording is voiced by the person from the second recording.<\/p>\n<div id=\"attachment_48594\" style=\"width: 1288px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2023\/07\/10100014\/audio-deepfake-technology-03.jpg\"><img loading=\"lazy\" aria-describedby=\"caption-attachment-48594\" decoding=\"async\" class=\"size-full wp-image-48594\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2023\/07\/10100014\/audio-deepfake-technology-03.jpg\" alt=\"The process of making a voice deepfake\" width=\"1278\" height=\"435\" \/><\/a><\/p>\n<p id=\"caption-attachment-48594\" class=\"wp-caption-text\">The process of making a voice deepfake. (<a href=\"http:\/\/cs230.stanford.edu\/projects_fall_2020\/reports\/55721255.pdf\" target=\"_blank\" rel=\"nofollow noopener\">Source<\/a>)<\/p>\n<\/div>\n<p>There are other approaches that use autoencoders, for example those that use <a href=\"https:\/\/en.wikipedia.org\/wiki\/Generative_adversarial_network\" target=\"_blank\" rel=\"nofollow noopener\">generative adversarial networks (GAN)<\/a> or <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diffusion_model\" target=\"_blank\" rel=\"nofollow noopener\">diffusion models<\/a>. Research into how to make deepfakes is supported in particular by the film industry. Think about it: with audio and video deepfakes, it&#8217;s possible to replace the faces of actors in movies and TV shows, and dub movies with synchronized facial expressions into any language.<\/p>\n<h2>How it&#8217;s done<\/h2>\n<p>As we were researching deepfake technologies, we wondered how hard it might be to make one&#8217;s own voice deepfake? It turns out there are lots of free open-source tools for working with voice conversion, but it isn&#8217;t so easy to get a high-quality result with them. It takes Python programming experience and good processing skills, and even then the quality is far from ideal. In addition to open source, there are also proprietary and paid solutions available.<\/p>\n<p>For example, in early 2023, Microsoft <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/01\/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio\/\" target=\"_blank\" rel=\"nofollow noopener\">announced<\/a> an algorithm that could reproduce a human voice based on an audio example that&#8217;s only three seconds long! This model also works with multiple languages, so you can even hear yourself speaking a foreign language. All this looks promising, but so far it&#8217;s only at the research stage. But the ElevenLabs platform <a href=\"https:\/\/www.theverge.com\/2023\/1\/31\/23579289\/ai-voice-clone-deepfake-abuse-4chan-elevenlabs\" target=\"_blank\" rel=\"nofollow noopener\">lets users<\/a> make voice deepfakes without any effort: just upload an audio recording of the voice and the words to be spoken, and that&#8217;s it. Of course, as soon as word got out, people started playing with this technology in all sorts of ways.<\/p>\n<h2>Hermione&#8217;s battle and an overly trusting bank<\/h2>\n<p>In full accordance with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Godwin%27s_law\" target=\"_blank\" rel=\"nofollow noopener\">Godwin&#8217;s law<\/a>, Emma Watson was made to <a href=\"https:\/\/www.vice.com\/en\/article\/dy7mww\/ai-voice-firm-4chan-celebrity-voices-emma-watson-joe-rogan-elevenlabs\" target=\"_blank\" rel=\"nofollow noopener\">read &#8220;Mein Kampf&#8221;<\/a>, and another user <a href=\"https:\/\/www.vice.com\/en\/article\/dy7axa\/how-i-broke-into-a-bank-account-with-an-ai-generated-voice\" target=\"_blank\" rel=\"nofollow noopener\">used<\/a> ElevenLabs technology to &#8220;hack&#8221; his own bank account. Sounds creepy? It does to us \u2013 especially when you add to the mix the popular horror stories about scammers collecting samples of voices over the phone by having folks say &#8220;yes&#8221; or &#8220;confirm&#8221; as they pretend to be a bank, government agency or poll service, and then steal money using voice authorization.<\/p>\n<p>But in reality things aren&#8217;t so bad. Firstly, it takes about five minutes of audio recordings to create an artificial voice in ElevenLabs, so a simple &#8220;yes&#8221; isn&#8217;t enough. Secondly, banks also know about these scams, so voice can only be used to initiate certain operations that aren&#8217;t related to the transfer of funds (for example, to check your account balance). So money can&#8217;t be stolen this way.<\/p>\n<p>To its credit, ElevenLabs reacted to the problem fast by rewriting the service rules, prohibiting free (i.e., anonymous) users to create deepfakes based on their own uploaded voices, and blocking accounts with complaints about &#8220;offensive content&#8221;.<\/p>\n<p>While these measures may be useful, they still don&#8217;t solve the problem of using voice deepfakes for suspicious purposes.<\/p>\n<h2>How else deepfakes are used in scams<\/h2>\n<p>Deepfake technology in itself is harmless, but in the hands of scammers it can become a dangerous tool with lots of opportunities for deception, defamation or disinformation. Fortunately, there haven&#8217;t been any mass cases of scams involving voice alteration, but there have been several high-profile cases involving voice deepfakes.<\/p>\n<p>In 2019, scammers used this technology to <a href=\"https:\/\/www.wsj.com\/articles\/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402\" target=\"_blank\" rel=\"nofollow noopener\">shake down UK-based energy firm<\/a>. In a telephone conversation, the scammer pretended to be the chief executive of the firm&#8217;s German parent company, and requested the urgent transfer of \u20ac220,000 ($243,000) to the account of a certain supplier company. After the payment was made, the scammer called twice more \u2013 the first time to put the UK office staff at ease and report that the parent company had already sent a refund, and the second time to request another transfer. All three times the UK CEO was absolutely positive that he was talking with his boss because he recognized both his German accent and his tone and manner of speech. The second transfer wasn&#8217;t sent only because the scammer messed up and called from an Austrian number instead of a German one, which made the UK SEO suspicious.<\/p>\n<p>A year later, in 2020, scammers used deepfakes to <a href=\"https:\/\/www.forbes.com\/sites\/thomasbrewster\/2021\/10\/14\/huge-bank-fraud-uses-deep-fake-voice-tech-to-steal-millions\/?sh=42bdebd47559\" target=\"_blank\" rel=\"nofollow noopener\">steal<\/a> up to $35,000,000 from an unnamed Japanese company (the name of the company and total amount of stolen goods weren&#8217;t disclosed by the investigation).<\/p>\n<p>It&#8217;s unknown which solutions (open source, paid, or even their own) the scammers used to fake voices, but in both the above cases the companies clearly suffered \u2013 badly \u2013 from deepfake fraud.<\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>Opinions differ about the future of deepfakes. Currently, most of this technology is in the hands of large corporations, and its availability to the public is limited. But as the history of much more popular generative models like <a href=\"https:\/\/openai.com\/dall-e-2\/\" target=\"_blank\" rel=\"nofollow noopener\">DALL-E<\/a>, <a href=\"https:\/\/www.midjourney.com\/\" target=\"_blank\" rel=\"nofollow noopener\">Midjourney<\/a> and <a href=\"https:\/\/stability.ai\/blog\/stable-diffusion-announcement\" target=\"_blank\" rel=\"nofollow noopener\">Stable Diffusion<\/a> shows, and even more so with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model\" target=\"_blank\" rel=\"nofollow noopener\">large language models<\/a> (ChatGPT anybody?), similar technologies may well appear in the public domain in the foreseeable future. This is confirmed by a recent <a href=\"https:\/\/www.semianalysis.com\/p\/google-we-have-no-moat-and-neither\" target=\"_blank\" rel=\"nofollow noopener\">leak<\/a> of internal Google correspondence in which representatives of the internet giant fear they&#8217;ll lose the AI race to open solutions. This will obviously result in an increase in the use of voice deepfakes \u2013 including for fraud.<\/p>\n<p>The most promising step in the development of deepfakes is real-time generation, which will ensure the explosive growth of deepfakes (and fraud based on them). Can you imagine a <a href=\"https:\/\/github.com\/iperov\/DeepFaceLive\" target=\"_blank\" rel=\"nofollow noopener\">video call<\/a> with someone whose face and voice are completely fake? <a href=\"https:\/\/blog.metaphysic.ai\/future-autoencoder-deepfakes\/\" target=\"_blank\" rel=\"nofollow noopener\">However<\/a>, this level of data processing requires huge resources only available to large corporations, so the best technologies will remain private and fraudsters won&#8217;t be able to keep up with the pros. The high quality bar will also help users learn how to easily identify fakes.<\/p>\n<h2>How to protect yourself<\/h2>\n<p>Now back to our very first question: can we trust the voices we hear (that is \u2013 if they&#8217;re not the voices in our head)? Well, it&#8217;s probably overdoing it if we&#8217;re paranoid all the time and start coming up with secret code words to use with friends and family; however, in more serious situations such paranoia might be appropriate. If everything develops based on the pessimistic scenario, deepfake technology in the hands of scammers could grow into a formidable weapon in the future, but there&#8217;s still time to get ready and build reliable methods of protection against counterfeiting: there&#8217;s already a lot of <a href=\"https:\/\/arxiv.org\/abs\/2005.08781\" target=\"_blank\" rel=\"nofollow noopener\">research<\/a> into deepfakes, and large companies are developing <a href=\"https:\/\/venturebeat.com\/ai\/intel-unveils-real-time-deepfake-detector-claims-96-accuracy-rate\/\" target=\"_blank\" rel=\"nofollow noopener\">security solutions<\/a>. In fact, we&#8217;ve already talked in detail about ways to combat video deepfakes <a href=\"https:\/\/www.kaspersky.com\/blog\/rsa2020-deepfakes-mitigation\/34006\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>For now, protection against AI fakes is only just beginning, so it&#8217;s important to keep in mind that deepfakes are just another kind of advanced social engineering. The risk of encountering fraud like this is small, but it&#8217;s still there, so it&#8217;s worth knowing and keeping in mind. If you get a strange call, pay attention to the sound quality. Is it in an unnatural monotone, is it unintelligible, or are there strange noises? Always double-check information through other channels, and remember that surprise and panic are what scammers rely on most.<\/p>\n<p> <input type=\"hidden\" class=\"category_for_banner\" value=\"premium-generic\" \/> <br \/><a href=\"https:\/\/www.kaspersky.com\/blog\/audio-deepfake-technology\/48586\/\" target=\"bwo\" >https:\/\/blog.kaspersky.com\/feed\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2023\/07\/10095211\/audio-deepfake-technology-featured.jpg\"\/><\/p>\n<p><strong>Credit to Author: Dmitry Anikin| Date: Mon, 10 Jul 2023 14:35:54 +0000<\/strong><\/p>\n<p>How voice deepfakes are made, what scams have already been used, what&#039;s the future for deepfake technologies, and how to protect yourself against voice faking.<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[10425,10378],"tags":[10245,11113,17473,9751,12038,12499,5897,1331,10438],"class_list":["post-22413","post","type-post","status-publish","format-standard","hentry","category-kaspersky","category-security","tag-ai","tag-artificial-intelligence","tag-deepfakes","tag-fraud","tag-machine-learning","tag-neural-networks","tag-privacy","tag-technology","tag-threats"],"_links":{"self":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/22413","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/comments?post=22413"}],"version-history":[{"count":0,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/22413\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/media?parent=22413"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/categories?post=22413"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/tags?post=22413"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}