{"id":23960,"date":"2024-02-16T03:30:05","date_gmt":"2024-02-16T11:30:05","guid":{"rendered":"http:\/\/www.palada.net\/index.php\/2024\/02\/16\/news-17690\/"},"modified":"2024-02-16T03:30:05","modified_gmt":"2024-02-16T11:30:05","slug":"news-17690","status":"publish","type":"post","link":"https:\/\/www.palada.net\/index.php\/2024\/02\/16\/news-17690\/","title":{"rendered":"How to run language models and other AI tools locally on your computer | Kaspersky official blog"},"content":{"rendered":"<p><strong>Credit to Author: Stan Kaminsky| Date: Fri, 16 Feb 2024 11:08:41 +0000<\/strong><\/p>\n<p>Many people are already experimenting with generative neural networks and finding regular use for them, including at work. For example, ChatGPT and its analogs are regularly used by almost <a href=\"https:\/\/www.business.com\/technology\/chatgpt-usage-workplace-study\/\" target=\"_blank\" rel=\"nofollow noopener\">60% of Americans<\/a> (and not always with permission from management). However, all the data involved in such operations \u2014 both user prompts and model responses \u2014 are stored on servers of OpenAI, Google, and the rest. For tasks where such information leakage is unacceptable, you don&#8217;t need to abandon AI completely \u2014 you just need to invest a little effort (and perhaps money) to run the neural network locally on your own computer \u2013 even a laptop.<\/p>\n<h2>Cloud threats<\/h2>\n<p>The most popular AI assistants run on the cloud infrastructure of large companies. It&#8217;s efficient and fast, but your data processed by the model may be accessible to both the AI service provider and completely unrelated parties, <a href=\"https:\/\/www.bbc.com\/news\/technology-65047304\" target=\"_blank\" rel=\"nofollow noopener\">as happened last year with ChatGPT<\/a>.<\/p>\n<p>Such incidents present varying levels of threat depending on what these AI assistants are used for. If you&#8217;re generating cute illustrations for some fairy tales you&#8217;ve written, or asking ChatGPT to create an itinerary for your upcoming weekend city break, it&#8217;s unlikely that a leak will lead to serious damage. However, if your conversation with a chatbot contains confidential info \u2014 personal data, passwords, or bank card numbers \u2014 a possible leak to the cloud is no longer acceptable. Thankfully, it&#8217;s relatively easy to prevent by pre-filtering the data \u2014 we&#8217;ve written a <a href=\"https:\/\/www.kaspersky.com\/blog\/how-to-use-chatgpt-ai-assistants-securely-2024\/50562\/\" target=\"_blank\" rel=\"noopener\">separate post<\/a> about that.<\/p>\n<p>However, in cases where either all the correspondence is confidential (for example, medical or financial information), or the reliability of pre-filtering is questionable (you need to process large volumes of data that no one will preview and filter), there&#8217;s only one solution: move the processing from the cloud to a local computer. Of course, running your own version of ChatGPT or Midjourney offline is unlikely to be successful, but other neural networks working locally provide comparable quality with less computational load.<\/p>\n<h2>What hardware do you need to run a neural network?<\/h2>\n<p>You&#8217;ve probably heard that working with neural networks requires super-powerful graphics cards, but in practice this isn&#8217;t always the case. Different AI models, depending on their specifics, may be demanding on such computer components as RAM, video memory, drive, and CPU (here, not only the processing speed is important, but also the processor&#8217;s support for certain vector instructions). The ability to load the model depends on the amount of RAM, and the size of the &#8220;context window&#8221; \u2014 that is, the memory of the previous conversation \u2014 depends on the amount of video memory. Typically, with a weak graphics card and CPU, generation occurs at a snail&#8217;s pace (one to two words per second for text models), so a computer with such a minimal setup is only appropriate for getting acquainted with a particular model and evaluating its basic suitability. For full-fledged everyday use, you&#8217;ll need to increase the RAM, upgrade the graphics card, or choose a faster AI model.<\/p>\n<p>As a starting point, you can try working with computers that were considered relatively powerful back in 2017: processors no lower than Core i7 with support for AVX2 instructions, 16GB of RAM, and graphics cards with at least 4GB of memory. For Mac enthusiasts, models running on the Apple M1 chip and above will do, while the memory requirements are the same.<\/p>\n<p>When choosing an AI model, you should first familiarize yourself with its system requirements. A search query like &#8220;<em>model_name<\/em> requirements&#8221; will help you assess whether it&#8217;s worth downloading this model given your available hardware. There are detailed studies available on the impact of memory size, CPU, and GPU on the performance of different models; for example, <a href=\"https:\/\/blog.nomic.ai\/posts\/gpt4all-gpu-inference-with-vulkan\" target=\"_blank\" rel=\"nofollow noopener\">this one<\/a>.<\/p>\n<p>Good news for those who don&#8217;t have access to powerful hardware \u2014 there are simplified AI models that can perform practical tasks even on old hardware. Even if your graphics card is very basic and weak, it&#8217;s possible to run models and launch environments using only the CPU. Depending on your tasks, these can even work acceptably well.<\/p>\n<div id=\"attachment_50579\" style=\"width: 1854px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2024\/02\/16055813\/how-to-use-AI-locally-01.png\"><img fetchpriority=\"high\" decoding=\"async\" aria-describedby=\"caption-attachment-50579\" class=\"size-full wp-image-50579\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2024\/02\/16055813\/how-to-use-AI-locally-01.png\" alt=\"GPU throughput tests\" width=\"1844\" height=\"1140\" \/><\/a><\/p>\n<p id=\"caption-attachment-50579\" class=\"wp-caption-text\">Examples of how various computer builds work with popular language models<\/p>\n<\/div>\n<h2>Choosing an AI model and the magic of quantization<\/h2>\n<p>A wide range of language models are available today, but many of them have limited practical applications. Nevertheless, there are easy-to-use and publicly available AI tools that are well-suited for specific tasks, be they generating text (for example, Mistral 7B), or creating code snippets (for example, Code Llama 13B). Therefore, when selecting a model, narrow down the choice to a few suitable candidates, and then make sure that your computer has the necessary resources to run them.<\/p>\n<p>In any neural network, most of the memory strain is courtesy of weights \u2014 numerical coefficients describing the operation of each neuron in the network. Initially, when training the model, the weights are computed and stored as high-precision fractional numbers. However, it turns out that rounding the weights in the trained model allows the AI tool to be run on regular computers while only slightly decreasing the performance. This rounding process is called quantization, and with its help the model&#8217;s size can be reduced considerably \u2014 instead of 16 bits, each weight might use eight, four, or even two bits.<\/p>\n<p>According to <a href=\"https:\/\/arxiv.org\/abs\/2305.17888\" target=\"_blank\" rel=\"nofollow noopener\">current research<\/a>, a larger model with more parameters and quantization can sometimes give better results than a model with precise weight storage but fewer parameters.<\/p>\n<p>Armed with this knowledge, you&#8217;re now ready to explore the treasure trove of open-source language models, namely the top <a href=\"https:\/\/huggingface.co\/spaces\/HuggingFaceH4\/open_llm_leaderboard\" target=\"_blank\" rel=\"nofollow noopener\">Open LLM leaderboard<\/a>. In this list, AI tools are sorted by several generation quality metrics, and filters make it easy to exclude models that are too large, too small, or too accurate.<\/p>\n<div id=\"attachment_50581\" style=\"width: 1782px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2024\/02\/16055911\/how-to-use-AI-locally-02.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-50581\" class=\"size-full wp-image-50581\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2024\/02\/16055911\/how-to-use-AI-locally-02.jpg\" alt=\"List of language models sorted by filter set\" width=\"1772\" height=\"846\" \/><\/a><\/p>\n<p id=\"caption-attachment-50581\" class=\"wp-caption-text\">List of language models sorted by filter set<\/p>\n<\/div>\n<p>After reading the model description and making sure it&#8217;s potentially a fit for your needs, test its performance in the cloud using <a href=\"https:\/\/huggingface.co\/\">Hugging Face<\/a> or <a href=\"https:\/\/colab.research.google.com\/\" target=\"_blank\" rel=\"nofollow noopener\">Google Colab<\/a> services. This way, you can avoid downloading models which produce unsatisfactory results, saving you time. Once you&#8217;re satisfied with the initial test of the model, it&#8217;s time to see how it works locally!<\/p>\n<h2>Required software<\/h2>\n<p>Most of the open-source models are published on <a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"nofollow noopener\">Hugging Face<\/a>, but simply downloading them to your computer isn&#8217;t enough. To run them, you have to install specialized software, such as <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\" target=\"_blank\" rel=\"nofollow noopener\">LLaMA.cpp<\/a>, or \u2014 even easier \u2014 its &#8220;wrapper&#8221;, <a href=\"https:\/\/lmstudio.ai\/\" target=\"_blank\" rel=\"nofollow noopener\">LM Studio<\/a>. The latter allows you to select your desired model directly from the application, download it, and run it in a dialog box.<\/p>\n<p>Another &#8220;out-of-the-box&#8221; way to use a chatbot locally is <a href=\"https:\/\/gpt4all.io\/index.html\" target=\"_blank\" rel=\"nofollow noopener\">GPT4All<\/a>. Here, the choice is limited to about a dozen language models, but most of them will run even on a computer with just 8GB of memory and a basic graphics card.<\/p>\n<p>If generation is too slow, then you may need a model with coarser quantization (two bits instead of four). If generation is interrupted or execution errors occur, the problem is often insufficient memory \u2014 it&#8217;s worth looking for a model with fewer parameters or, again, with coarser quantization.<\/p>\n<p>Many models on Hugging Face have already been quantized to varying degrees of precision, but if no one has quantized the model you want with the desired precision, you can do it yourself using <a href=\"https:\/\/github.com\/IST-DASLab\/gptq\" target=\"_blank\" rel=\"nofollow noopener\">GPTQ<\/a>.<\/p>\n<p>This week, another promising tool was released to public beta: <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-on-rtx\/chat-with-rtx-generative-ai\/\" target=\"_blank\" rel=\"nofollow noopener\">Chat With RTX<\/a> from NVIDIA. The manufacturer of the most sought-after AI chips has released a local chatbot capable of summarizing the content of YouTube videos, processing sets of documents, and much more \u2014 provided the user has a Windows PC with 16GB of memory and an NVIDIA RTX 30<sup>th<\/sup> or 40<sup>th<\/sup> series graphics card with 8GB or more of video memory. &#8220;Under the hood&#8221; are the same varieties of Mistral and Llama 2 from <a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"nofollow noopener\">Hugging Face<\/a>. Of course, powerful graphics cards can improve generation performance, but according to the <a href=\"https:\/\/www.theverge.com\/2024\/2\/13\/24071645\/nvidia-ai-chatbot-chat-with-rtx-tech-demo-hands-on\" target=\"_blank\" rel=\"nofollow noopener\">feedback from the first testers<\/a>, the existing beta is quite cumbersome (about 40GB) and difficult to install. However, NVIDIA&#8217;s Chat With RTX could become a very useful local AI assistant in the future.<\/p>\n<div id=\"attachment_50582\" style=\"width: 1369px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2024\/02\/16060120\/how-to-use-AI-locally-03.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-50582\" class=\"size-full wp-image-50582\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2024\/02\/16060120\/how-to-use-AI-locally-03.png\" alt=\"The code for the game &quot;Snake&quot;, written by the quantized language model TheBloke\/CodeLlama-7B-Instruct-GGUF\" width=\"1359\" height=\"865\" \/><\/a><\/p>\n<p id=\"caption-attachment-50582\" class=\"wp-caption-text\">The code for the game &#8220;Snake&#8221;, written by the quantized language model TheBloke\/CodeLlama-7B-Instruct-GGUF<\/p>\n<\/div>\n<p>The applications listed above perform all computations locally, don&#8217;t send data to servers, and can run offline so you can safely share confidential information with them. However, to fully protect yourself against leaks, you need to ensure not only the security of the language model but also that of your computer \u2013 and that&#8217;s where our <a href=\"https:\/\/www.kaspersky.com\/premium?icid=gl_bb2023-kdplacehd_acq_ona_smm__onl_b2c_kdaily_lnk_sm-team___kprem___\" target=\"_blank\">comprehensive security solution<\/a>\u00a0comes in. As confirmed in <a href=\"https:\/\/www.kaspersky.com\/top3\" target=\"_blank\" rel=\"noopener\">independent tests<\/a>, <a href=\"https:\/\/www.kaspersky.com\/premium?icid=gl_bb2023-kdplacehd_acq_ona_smm__onl_b2c_kdaily_lnk_sm-team___kprem___\" target=\"_blank\">Kaspersky Premium<\/a>\u00a0has practically no impact on your computer&#8217;s performance \u2014 an important advantage when working with local AI models.<\/p>\n<p> <input type=\"hidden\" class=\"category_for_banner\" value=\"premium-geek\" \/> <br \/><a href=\"https:\/\/www.kaspersky.com\/blog\/how-to-use-ai-locally-and-securely\/50576\/\" target=\"bwo\" >https:\/\/blog.kaspersky.com\/feed\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2024\/02\/16055653\/how-to-use-ai-locally-and-securely-featured.jpg\"\/><\/p>\n<p><strong>Credit to Author: Stan Kaminsky| Date: Fri, 16 Feb 2024 11:08:41 +0000<\/strong><\/p>\n<p>What hardware and applications are needed to use AI on a local computer without internet access or the risk of data leakage.<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[10425,10378],"tags":[10245,11113,13431,28405,4500,12038,10428],"class_list":["post-23960","post","type-post","status-publish","format-standard","hentry","category-kaspersky","category-security","tag-ai","tag-artificial-intelligence","tag-chatbots","tag-chatgpt","tag-cybersecurity","tag-machine-learning","tag-tips"],"_links":{"self":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/23960","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/comments?post=23960"}],"version-history":[{"count":0,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/23960\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/media?parent=23960"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/categories?post=23960"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/tags?post=23960"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}