{"id":6680,"date":"2017-02-17T13:10:30","date_gmt":"2017-02-17T21:10:30","guid":{"rendered":"http:\/\/www.palada.net\/index.php\/2017\/02\/17\/news-499\/"},"modified":"2017-02-17T13:10:30","modified_gmt":"2017-02-17T21:10:30","slug":"news-499","status":"publish","type":"post","link":"http:\/\/www.palada.net\/index.php\/2017\/02\/17\/news-499\/","title":{"rendered":"Explained: Bayesian spam filtering"},"content":{"rendered":"<p><strong>Credit to Author: Pieter Arntz| Date: Fri, 17 Feb 2017 16:30:10 +0000<\/strong><\/p>\n<p>Bayesian spam filtering is based on Bayes rule, a statistical theorem that gives you the probability of an event. In Bayesian filtering it is used to give you the probability that a certain email is spam.<\/p>\n<p><strong>The name<\/strong><\/p>\n<p>Named after the statistician Rev. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes\">Thomas Bayes<\/a> who provided an equation that basically allows new information to update the outcome of a probability calculation. The rule is also called the Bayes-Price rule after the mathematician <a href=\"https:\/\/en.wikipedia.org\/wiki\/Richard_Price\">Richard Price<\/a>, as he recognized the importance of the theorem, made some corrections to Bayes\u2019 work and put the rule to use.<\/p>\n<p><strong>Spam<\/strong><\/p>\n<p>When dealing with spam the theorem is used to calculate a probability whether a certain message is spam based on words in the title and message, learning from messages that were identified as spam and messages that were identified as not being spam (sometimes called ham).<\/p>\n<p><strong>False positives<\/strong><\/p>\n<p>The objective of the learning ability is to reduce the number of false positives. As annoying it might be to receive a spam message, it is worse to not receive a message from a customer just because he used a word that triggered the filter.<\/p>\n<p><strong>Scoring<\/strong><\/p>\n<p>Other methods often use simple scoring filters. If a message contains specific words a few points are added to that messages\u2019 score and when it exceeds a\u00a0 certain score, the message is regarded as spam. Not only is this a very arbitrary method, it\u2019s also a given that this will result in spammers changing their wording. Take for example \u201cViagra\u201d which is a word that will surely give you a high score. As soon as spammers found that out they switched to variations like \u201cV!agra\u201d and so on. A cat and mouse game that will keep you busy creating new rules.<\/p>\n<p><strong>Learning <\/strong><\/p>\n<p>If the filtering is allowed for individual input the precision can be enhanced on a per-user base. Different users may attract specific forms of spam based on their online activities. Or what is spam to one person is a \u201cmust-read\u201d newsletter to the next. Every time the user confirms or denies that a message is spam, the filtering process can calculate a more refined probability for the next occasion.<\/p>\n<p><b>Poisoning<\/b><\/p>\n<p>A downside of Bayesian filtering in cases of more or less targeted spam is that spammers will start using words or whole pieces of text that will lower the score. During prolonged use, these words might get associated with spam, which is called poisoning.<\/p>\n<p><strong>Bypasses<\/strong><\/p>\n<p>A few methods to bypass \u201cbad word\u201d filtering.<\/p>\n<ul>\n<li>The use of images to replace words that are known to raise the score<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16388\" src=\"https:\/\/blog.malwarebytes.com\/wp-content\/uploads\/2017\/02\/spam.png\" alt=\"\" width=\"760\" height=\"138\" srcset=\"https:\/\/blog.malwarebytes.com\/wp-content\/uploads\/2017\/02\/spam.png 760w, https:\/\/blog.malwarebytes.com\/wp-content\/uploads\/2017\/02\/spam-300x54.png 300w, https:\/\/blog.malwarebytes.com\/wp-content\/uploads\/2017\/02\/spam-600x109.png 600w\" sizes=\"auto, (max-width: 760px) 100vw, 760px\" \/><\/p>\n<ul>\n<li>Deliberate misspelling, as mentioned earlier.<\/li>\n<li>Using homograph letters, which are characters from other character-sets that look similar to letters in the messages\u2019 character set. For example the Omicron from the Greek which looks exactly the same as an \u201cO\u201d, but has a different character encoding.<\/li>\n<\/ul>\n<p><strong>Conclusion<\/strong><\/p>\n<p>Bayesian filtering is a method of spam filtering that has a learning ability, although limited. Knowing how spam filters work will make it more clear how some messages get through and how you can make your own mails less prone to get caught in a spam filter.<\/p>\n<p><strong>Links:<\/strong><\/p>\n<p><a href=\"http:\/\/www.fun.ac.jp\/~niimi\/ronbun\/TUE-3-3.pdf\">Evaluation of Bayesian Spam Filter and SVM Spam Filter<\/a><\/p>\n<p><a href=\"http:\/\/ats.cs.ut.ee\/u\/kt\/hw\/spam\/spam.pdf\">Machine Learning Techniques in Spam Filtering<\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Pieter Arntz<\/p>\n<p><a href=\"https:\/\/blog.malwarebytes.com\/security-world\/2017\/02\/explained-bayesian-spam-filtering\/\" target=\"bwo\" >https:\/\/blog.malwarebytes.com\/feed\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p><strong>Credit to Author: Pieter Arntz| Date: Fri, 17 Feb 2017 16:30:10 +0000<\/strong><\/p>\n<table cellpadding='10'>\n<tr>\n<td valign='top' align='center'><a href='https:\/\/blog.malwarebytes.com\/security-world\/2017\/02\/explained-bayesian-spam-filtering\/' title='Explained: Bayesian spam filtering'><img src='https:\/\/blog.malwarebytes.com\/wp-content\/uploads\/2015\/11\/photodune-6673197-spam-email-m-965x395.jpg' border='0'  width='300px'  \/><\/a><\/td>\n<\/tr>\n<tr>\n<td valign='top' align='left'>Bayesian spam filtering is based on Bayes rule, a statistical theorem that gives you the probability of an event. In Bayesian filtering it is used to give you the probability that a certain email is spam. The name Named after the statistician Rev. Thomas Bayes who provided an equation that basically allows new information to&#8230;<\/p>\n<p>Categories: <\/p>\n<ul class=\"post-categories\">\n<li><a href=\"https:\/\/blog.malwarebytes.com\/category\/security-world\/\" rel=\"category tag\">Security world<\/a><\/li>\n<li><a href=\"https:\/\/blog.malwarebytes.com\/category\/security-world\/technology\/\" rel=\"category tag\">Technology<\/a><\/li>\n<\/ul>\n<p>Tags: <a href=\"https:\/\/blog.malwarebytes.com\/tag\/bayesian\/\" rel=\"tag\">bayesian<\/a><a href=\"https:\/\/blog.malwarebytes.com\/tag\/filter\/\" rel=\"tag\">filter<\/a><a href=\"https:\/\/blog.malwarebytes.com\/tag\/pieter-arntz\/\" rel=\"tag\">Pieter Arntz<\/a><a href=\"https:\/\/blog.malwarebytes.com\/tag\/spam\/\" rel=\"tag\">spam<\/a><a href=\"https:\/\/blog.malwarebytes.com\/tag\/the-more-you-know\/\" rel=\"tag\">the more you know<\/a><\/p>\n<table width='100%'>\n<tr>\n<td align=right>\n<p><b>(<a href='https:\/\/blog.malwarebytes.com\/security-world\/2017\/02\/explained-bayesian-spam-filtering\/' title='Explained: Bayesian spam filtering'>Read more&#8230;<\/a>)<\/b><\/p>\n<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<\/tr>\n<\/table>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[10488,10378],"tags":[11398,11399,10523,10497,10518,1331,10524],"class_list":["post-6680","post","type-post","status-publish","format-standard","hentry","category-malwarebytes","category-security","tag-bayesian","tag-filter","tag-pieter-arntz","tag-security-world","tag-spam","tag-technology","tag-the-more-you-know"],"_links":{"self":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/6680","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/comments?post=6680"}],"version-history":[{"count":0,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/6680\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/media?parent=6680"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/categories?post=6680"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/tags?post=6680"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}