{"id":23479,"date":"2023-11-29T04:30:46","date_gmt":"2023-11-29T12:30:46","guid":{"rendered":"https:\/\/www.palada.net\/index.php\/2023\/11\/29\/news-17209\/"},"modified":"2023-11-29T04:30:46","modified_gmt":"2023-11-29T12:30:46","slug":"news-17209","status":"publish","type":"post","link":"http:\/\/www.palada.net\/index.php\/2023\/11\/29\/news-17209\/","title":{"rendered":"GenAI is highly inaccurate for business use \u2014 and getting more opaque"},"content":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/images.idgesg.net\/images\/article\/2023\/11\/shutterstock_2328020527-100948779-small.jpg\"\/><\/p>\n<p><a href=\"https:\/\/www.computerworld.com\/article\/3697649\/what-are-large-language-models-and-how-are-they-used-in-generative-ai.html\">Large language models<\/a> (LLMs), the algorithmic platforms on which generative AI (genAI) tools like ChatGPT are built, are highly inaccurate when connected to corporate databases and becoming less transparent, according to two studies.<\/p>\n<p><a href=\"https:\/\/hai.stanford.edu\/news\/introducing-foundation-model-transparency-index\" rel=\"nofollow noopener\" target=\"_blank\">One study by Stanford University<\/a>\u00a0showed that as LLMs continue to ingest massive amounts of information and grow in size, the genesis of the data they use is becoming harder to track down. That, in turn, makes it difficult for businesses to know whether they can safely build applications that use commercial genAI foundation models and for academics to rely on them for research.<\/p>\n<p>It also makes it more difficult for lawmakers to design meaningful policies to rein in the powerful technology, and \u201cfor consumers to understand model limitations or seek redress for harms caused,\u201d the Stanford study said.<\/p>\n<p>LLMs (also known as foundation models) such as GPT, LLaMA, and DALL-E emerged over the past year and have transformed artificial intelligence (AI), giving many of the companies experimenting with them a boost in productivity and efficiency. But those benefits come with a heavy dollop of uncertainty.<\/p>\n<p>\u201cTransparency is an essential precondition for public accountability, scientific innovation and effective governance of digital technologies,\u201d said Rishi Bommasani, society lead at Stanford\u2019s Center for Research on Foundation Models. \u201cA lack of transparency has long been a problem for consumers of digital technologies.&#8221;<\/p>\n<p>For example, deceptive online ads and pricing, unclear wage practices in ride-sharing, dark patterns that trick users into unknowing purchases, and a myriad number of transparency issues around content moderation created a vast ecosystem of mis- and disinformation on social media, Bommasani noted.<\/p>\n<p>&#8220;As transparency around commercial [foundation models] wanes, we face similar sorts of threats to consumer protection,&#8221; he said.<\/p>\n<p>For example, OpenAI, which has the word &#8220;open&#8221; right in its name, has clearly stated that it will not be transparent about most aspects of its flagship model, GPT-4, the Stanford researchers noted.<\/p>\n<p>To assess transparency, Stanford brought together a team that included researchers from MIT and Princeton to design a scoring system called the\u00a0<a href=\"https:\/\/crfm.stanford.edu\/fmti\/\" rel=\"nofollow\">Foundation<\/a><a href=\"https:\/\/crfm.stanford.edu\/fmti\/\" rel=\"nofollow\"> M<\/a><a href=\"https:\/\/crfm.stanford.edu\/fmti\/\" rel=\"nofollow\">odel<\/a><a href=\"https:\/\/crfm.stanford.edu\/fmti\/\" rel=\"nofollow\"> Transp<\/a><a href=\"https:\/\/crfm.stanford.edu\/fmti\/\" rel=\"nofollow\">arency<\/a><a href=\"https:\/\/crfm.stanford.edu\/fmti\/\" rel=\"nofollow\"> In<\/a><a href=\"https:\/\/crfm.stanford.edu\/fmti\/\" rel=\"nofollow\">dex<\/a> (FMTI). It evaluates 100 different aspects or indicators of transparency, from how a company builds a foundation model, how it works, and how it is used downstream.\u00a0\u00a0<\/p>\n<p>The Stanford study evaluated 10 LLMs and found the mean transparency score was just 37%. LLaMA scored highest, with a transparency rating of 52%; it was followed by GPT-4 and PaLM 2, which scored a 48% and 47%, respectively.<\/p>\n<p>\u201cIf you don\u2019t have transparency, regulators can\u2019t even pose the right questions, let alone take action in these areas,\u201d Bommasani said.<\/p>\n<p>Meanwhile, almost all senior bosses (95%) believe genAI tools are regularly used by employees, with more than half (53%) saying it is now driving certain business departments, according to seperate survey by cybersecurity and anti-virus provider Kaspersky Lab. That study found 59% of executives now expressing deep concerns about genAI-related security risks that could jeopardize sensitive company information and lead to a loss of control of core business functions.<\/p>\n<p>\u201cMuch like BYOD, genAI offers massive productivity benefits to businesses, but while our findings reveal that boardroom executives are clearly acknowledging its presence in their organizations, the extent of its use and purpose are shrouded in mystery,\u201d David Emm, Kaspersky\u2019s principal security researcher, said in a statement.<\/p>\n<p>The problem with LLMs goes deeper than just transparency; the overall accuracy of the models has been questioned almost from the moment <a href=\"https:\/\/www.computerworld.com\/article\/3710293\/openais-chatgpt-turns-one-year-old-what-it-did-and-didnt-do.html\">OpenAI released ChatGPT a year ago<\/a>.<\/p>\n<p>Juan Sequeda, head of the AI Lab at <a href=\"https:\/\/data.world\/\" rel=\"nofollow noopener\" target=\"_blank\">data.world<\/a>, a data cataloging platform provider, said his company tested LLMs connected to SQL databases and tasked with providing answers to company-specific questions. Using real-world insurance company data, <a href=\"chrome-extension:\/\/efaidnbmnnnibpcajpcglclefindmkaj\/https:\/arxiv.org\/pdf\/2311.07509.pdf\" rel=\"nofollow\">data.world\u2019s study<\/a> showed that LLMs return accurate responses to most basic business queries just 22% of the time. And for intermediate and expert-level queries, accuracy plummeted to 0%.\u00a0<\/p>\n<p>The absence of suitable text-to-SQL benchmarks tailored to enterprise settings may be affecting LLMs&#8217; ability to accurately respond to user questions or \u201cprompts.\u201d<\/p>\n<p>\u201cIt\u2019s understood that LLMs lack internal business context, which is key to accuracy,\u201d Sequeda said. \u201cOur study shows a gap when it comes to using LLMs specifically with SQL databases, which is the main source of structured data in the enterprise. I would hypothesize that the gap exists for other databases as well.\u201d<\/p>\n<p>Enterprises invest millions of dollars in cloud data warehouses, business intelligence, visualization tools, and ETL and ELT systems, all so they can better leverage data, Sequeda noted. Being able to use LLMs to ask questions about that data opens up huge possibilities for improving processes such as key performance indicators, metrics and strategic planning, or creating entirely new applications that leverage the deep domain expertise to create more value.<\/p>\n<p>The study primarily focused on question answering using GPT-4, with <a href=\"https:\/\/machinelearningmastery.com\/what-are-zero-shot-prompting-and-few-shot-prompting\/\" rel=\"nofollow noopener\" target=\"_blank\">zero-shot prompts<\/a> directly on SQL databases. The accuracy rate? Just 16%.<\/p>\n<p>The net effect of inaccurate responses based on corporate databases is an erosion of trust. \u201cWhat happens if you are presenting to the board with numbers that aren\u2019t accurate? Or the SEC? In each instance, the cost would be high,\u201d Sequeda said.<\/p>\n<p>The problem with LLMs is that they are statistical and pattern-matching machines that predict the next word based on what words have come before. Their predictions are based on observing patterns from the entire content of the open web. Because the open web is essentially a very large dataset, the LLM will return things that seem very plausible but may also be inaccurate, according to Sequeda.<\/p>\n<p>\u201cA subsequent reason is that the models only make predictions based on the patterns they have seen. What happens if they haven\u2019t seen patterns specific to your enterprise? Well, the inaccuracy increases,\u201d he said.<\/p>\n<p>\u201cIf enterprises try to implement LLMs at any significant scale without addressing accuracy, the initiatives will fail,\u201d Sequeda continued. \u201cUsers will soon discover that they can\u2019t trust the LLMs and stop using them. We\u2019ve seen a similar pattern in data and analytics over the years.\u201d<\/p>\n<p>The accuracy of LLMs increased to 54% when questions are posed over a Knowledge Graph representation of the enterprise SQL database. \u201cTherefore, investing in Knowledge Graph providers higher accuracy for LLM-powered questions-answering systems,\u201d Sequeda said. \u201cIt\u2019s still not clear why this happens, because we don\u2019t know what\u2019s going on inside the LLM.<\/p>\n<p>\u201cWhat we do know is that if you give an LLM a prompt with the ontology mapped within a knowledge graph, which contains the critical business context, the accuracy is three times more than if you don\u2019t,\u201d Sequeda continued. \u201cHowever, it\u2019s important to ask ourselves, what does \u2018accurate enough\u2019 mean?\u201d<\/p>\n<p>To increase the possibility of accurate responses from LLMs, companies need to have a \u201cstrong data foundation,\u201d or what Sequeda and others call AI-ready data; that means the data is mapped in a Knowledge Graph to increase the accuracy of the responses and to ensure that there is explainability, \u201cwhich means that you can make the LLM show its work.\u201d<\/p>\n<p>Another way to boost model accuracy would be using small language models (SLMs) or even industry-specific language models (ILMs). \u201cI could see a future where each enterprise is leveraging a number of specific LLMs, each tuned for specific types of question-answering,\u201d Sequeda said. <br \/>\u201cNevertheless, the approach continues to be the same: predicting the next word. That prediction may be high, but there will always be a chance that the prediction is wrong.\u201d<\/p>\n<p>Every company also needs to ensure oversight and governance to prevent sensitive and proprietary information from being placed at risk by models that aren\u2019t predictable, Sequada said.<\/p>\n<p><a href=\"https:\/\/www.computerworld.com\/article\/3711343\/genai-is-highly-inaccurate-for-business-use-and-getting-more-opaque.html#tk.rss_security\" target=\"bwo\" >http:\/\/www.computerworld.com\/category\/security\/index.rss<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/images.idgesg.net\/images\/article\/2023\/11\/shutterstock_2328020527-100948779-small.jpg\"\/><\/p>\n<article>\n<section class=\"page\">\n<p><a href=\"https:\/\/www.computerworld.com\/article\/3697649\/what-are-large-language-models-and-how-are-they-used-in-generative-ai.html\">Large language models<\/a> (LLMs), the algorithmic platforms on which generative AI (genAI) tools like ChatGPT are built, are highly inaccurate when connected to corporate databases and becoming less transparent, according to two studies.<\/p>\n<p><a href=\"https:\/\/hai.stanford.edu\/news\/introducing-foundation-model-transparency-index\" rel=\"nofollow noopener\" target=\"_blank\">One study by Stanford University<\/a>\u00a0showed that as LLMs continue to ingest massive amounts of information and grow in size, the genesis of the data they use is becoming harder to track down. That, in turn, makes it difficult for businesses to know whether they can safely build applications that use commercial genAI foundation models and for academics to rely on them for research.<\/p>\n<p class=\"jumpTag\"><a href=\"\/article\/3711343\/genai-is-highly-inaccurate-for-business-use-and-getting-more-opaque.html#jump\">To read this article in full, please click here<\/a><\/p>\n<\/section>\n<\/article>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[11062,10643],"tags":[11113,13431,11070,29835,8698,714],"class_list":["post-23479","post","type-post","status-publish","format-standard","hentry","category-computerworld","category-independent","tag-artificial-intelligence","tag-chatbots","tag-emerging-technology","tag-generative-ai","tag-regulation","tag-security"],"_links":{"self":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/23479","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/comments?post=23479"}],"version-history":[{"count":0,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/23479\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/media?parent=23479"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/categories?post=23479"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/tags?post=23479"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}