{"id":14379,"date":"2019-01-20T14:20:24","date_gmt":"2019-01-20T22:20:24","guid":{"rendered":"https:\/\/www.palada.net\/index.php\/2019\/01\/20\/news-8131\/"},"modified":"2019-01-20T14:20:24","modified_gmt":"2019-01-20T22:20:24","slug":"news-8131","status":"publish","type":"post","link":"https:\/\/www.palada.net\/index.php\/2019\/01\/20\/news-8131\/","title":{"rendered":"Using Machine Learning To Detect Anomalies"},"content":{"rendered":"<p><strong>Credit to Author: dmitryc| Date: Mon, 21 Dec 2015 22:07:07 +0000<\/strong><\/p>\n<div class=\"entry-content\">\n<div class=\"pf-content\">\n<p>I&#8217;m going to start blogging more about detection of protocol\/app anomalies, detection of lateral movement and\/or data exfiltration, and more. For many years I have been watching users and applications furrow their way across networks and I&#8217;m gonna start data-dumping that info here \ud83d\ude42 <\/p>\n<p>But&#8230;first&#8230;I manage a web server for a friend. It occurred to me that machine-learning could be useful in alerting when an attack is under way. I took the following steps<\/p>\n<p>1) Get as much data as possible for this device. For Apache, this just meant gathering all the log files.<\/p>\n<p>2) Parse the data and, for each session, look at the path taken as the user or bot perused the server (Note: outside of my initial scope, but timestamps are useful here to weed out a user versus a machine).<\/p>\n<p>3) So, an average session will look like R1-&gt;R2-&gt;R3-&gt;RX where each &#8220;R&#8221; is a request. So R1 could be index.html, R2 could be &#8220;Contact Us&#8221;, R3 could be &#8220;contact_form.php&#8221;, etc. I started using Markov to build a model; however, instead, I took each set of 2 and initialized those values&#8230;e.g. S={R1-&gt;R2,R2-&gt;R3,R3-&gt;RX}. For the next session I might have S={R1-&gt;R5,R5-&gt;R3,etc.}. At the end of all the parsing, I have a big set of all state transitions possible for each R. So, given RX, there are a finite number of R states that RX can transition to.<\/p>\n<p>4) For each of the R states, I now re-parse the log file and find the number of transitions. This is a matrix that shows the number of observed transitions from RN to every other R state. So, for instance, let&#8217;s say that R1 goes to 3 possible states : R4 (27% of time), R11 (3% of time) , and R12 (70% of time). Then the R1 row of our matrix looks like [0, 0, 0, .27, 0, 0, 0, 0, 0, 0, .03, .7] <\/p>\n<p>5) There were some special cases that I had to account for (any page transitioning to the main page, any page transitioning to itself, etc.). Once I accounted for these, I ran my program against the log files and created LOW, MEDIUM, and HIGH alerts. I didn&#8217;t use a true standard deviation and I ignored the LOW and MEDIUM stuff&#8230;I just wanted the hits where the number for that transition was extremely low or 0. From our example above, this would be a transition like R1-&gt;R2=0. I didn&#8217;t really expect great results and figured that I would have to do a lot more tweaking&#8230;well, this wasn&#8217;t the case. I actually got really, really good data on my first run. Example:<\/p>\n<p><em>732 total state transitions tracked<br \/> HIGH RISK GET \/componentes3.7\/fckeditor\/editor\/fckeditor.html-&gt;GET \/affiliate\/affiliate53\/fckeditor\/editor\/fckeditor.html<\/p>\n<p>HIGH RISK GET \/portfolio\/aui\/FCKeditor\/editor\/fckeditor.html-&gt;GET \/componentes3.7\/fckeditor\/editor\/fckeditor.html<\/p>\n<p>HIGH RISK GET \/wp-content\/uploads\/wpfouot.php-&gt;POST \/wp-content\/plugins\/Login-wall-etgFB\/login_wall.php<\/p>\n<p>etc.<\/em><\/p>\n<p>So, I can use really basic machine learning to find my attackers in my web logs. I then parse out the attackers&#8217; IP addresses and can throw them into a firewall ruleset. In the future, I would like to automate this and find when my server is under attack, send a message to my firewall which drops in a route rule which spins all of the attackers traffic to my honey net \ud83d\ude42 <\/p>\n<p>Speaking of honeypots, You can also honeypot certain pages. For instance, I could create bogus files or directories based on what I see attackers going after (like the report from above) and drop canary tokens in there to (see <a href=\"https:\/\/canary.tools\/\" target=\"_blank\">Canary Tools<\/a>). I can embed honeypot links within HTML comments and see where bots (or humans) are taking links from commented code and trying them out. I can put links in my robots.txt file and see who goes after them&#8230;there are so many ways to do this&#8230;and, at the end of the day, I can either run these attackers off my network or into a fake network&#8230;it&#8217;s just TONS and TONS of fun \ud83d\ude42<\/p>\n<p>!Dmitry<br \/> dmitry.chan@gmail.com<\/p>\n<div class=\"printfriendly pf-alignleft\"><a href=\"#\" rel=\"nofollow\" onclick=\"window.print(); return false;\" class=\"noslimstat\" title=\"Printer Friendly, PDF &#038; Email\"><img decoding=\"async\" style=\"border:none;-webkit-box-shadow:none; box-shadow:none;\" src=\"https:\/\/cdn.printfriendly.com\/buttons\/printfriendly-button.png\" alt=\"Print Friendly, PDF &#038; Email\" \/><\/a><\/div>\n<\/div><\/div>\n<p><a href=\"https:\/\/blogs.securiteam.com\/index.php\/archives\/2690\" target=\"bwo\" >https:\/\/blogs.securiteam.com\/index.php\/feed<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/cdn.printfriendly.com\/buttons\/printfriendly-button.png\"\/><\/p>\n<p><strong>Credit to Author: dmitryc| Date: Mon, 21 Dec 2015 22:07:07 +0000<\/strong><\/p>\n<p>I&#8217;m going to start blogging more about detection of protocol\/app anomalies, detection of lateral movement and\/or data exfiltration, and more. For many years I have been watching users and applications furrow their way across networks and I&#8217;m gonna start data-dumping that info here \ud83d\ude42 But&#8230;first&#8230;I manage a web server for a friend. It occurred to &#8230; <a href=\"https:\/\/blogs.securiteam.com\/index.php\/archives\/2690\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Using Machine Learning To Detect Anomalies<\/span><\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[10643,10754],"tags":[10755],"class_list":["post-14379","post","type-post","status-publish","format-standard","hentry","category-independent","category-securiteam","tag-commentary"],"_links":{"self":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/14379","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/comments?post=14379"}],"version-history":[{"count":0,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/14379\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/media?parent=14379"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/categories?post=14379"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/tags?post=14379"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}