Machine Learning Approach for Advanced Threat Hunting

Credit to Author: Ajay| Date: Fri, 02 Feb 2018 11:41:12 +0000

In today’s fast-changing world, the cyber threat landscape is getting increasingly complex and signature-based systems are falling behind to protect endpoints. All major security solutions are built with layered security models to protect endpoints from today’s advanced threats. Machine learning-based detections are also becoming an inevitable component of these layered security models. In this post, we are going to discuss how Quick Heal is incorporating machine learning (ML) in its products to protect endpoints. Use of ML or AI (artificial intelligence) has already been embraced by the security industry. We, at Quick Heal, are using ML for different use cases. One of them is an initial layer of our multi-level defense. The main objective of this layer is to determine the suspiciousness of a given file or sample. If it finds the file as suspicious, then the file will be scrutinized with the next layers, This filtering reduces the load on the other security layers. Here, we are going to discuss how we use static analysis of PE files using ML to filter out the traffic sent to the next security layers.   Machine Learning Overview In Machine Learning(ML) we try to give computers the ability to learn from data instead of programmed explicitly. In our problem statement, we can use the power of ML to make a model that is trained on the known data of clean and malicious files and that can further predict the nature of unknown sample file. The key requirement for building ML models is to have data for training. And it must be accurately labeled otherwise results and predictions can be misleading. The process starts with organizing train set by collecting a large number of clean and malicious samples and labeling them. We extract features from these samples(for our models we extracted features from PE headers of all the collected samples). Then the model is trained by providing these feature matrices to it. As shown in following block diagram: fig. 1: Machine Learning Training Cycle Feature array of unknown sample is provided to this trained model. It returns confidence score for that sample suggesting whether this file is similar to clean or malicious set. fig. 2: ML Prediction stage   Feature Selection A feature is an individual measurable property or characteristic of the things being observed. For example, if we want to classify cats and dogs we can take features like “type of fur”, “loves company of others or loves to stay alone”, “barks or not”,  “number of legs”, “very sharp claws”, etc. Selecting good features is one of the most important steps in training any ML model. As in above example, if we select “number of legs” as a feature for training, it won’t be possible to classify since both dogs and cats have same number of legs. But if we take “barks or not”, it would be good classifier as dogs bark and cats don’t. Also “sharpness of claws” would be a good feature. On the similar line, to classify malicious samples from clean ones, we need to have features. So, we start with deriving pe file header attributes which need fewer computations to extract. We have some boolean kind features, like “EntryPoint present in First Section”, “Resource present” or “Section has a special character in its name”. And some count or value-based like “NumberOfSections”, “File TimeDateStamp”, etc. Some of these features are so strong that many times they act as good separator for malicious and benign samples. We will try to explain this by following two examples. As shown in following graph 1.0, in most of the cases, for clean files, entry point section is 0th section, but for malicious files, it may vary. graph 1: %(percentage) of Entry point section index for clean v/s malicious   Similarly, for an index of “First empty section” of a file we get following plot: graph 2: %(percentage) of index of the first empty section for clean v/s malicious   Feature Scaling And Dimensionality Reduction The data we feed to our model must be scaled properly. For example- Value of attribute like Address_OF_EntryPoint = 307584 will have a very large value as compared to value of attribute like No_Of_Sections = 5. Such a huge…
http://blogs.quickheal.com/feed/