Results from the 2023 MITRE Engenuity ATT&CK Evaluations (Round 5: Turla)

The fifth round of MITRE Engenuity ATT&CK^® Evaluations has been released, assessing the ability of 30 endpoint detection and response (EDR) solutions to detect, analyze, and describe the tactics, techniques and procedures (TTPs) leveraged by one of the most sophisticated threat groups: Turla.

We’re going to spend most of this article explaining how Sophos achieved 99% detection coverage, what contextual information Sophos Intercept X presented to the user (in this case, MITRE’s evaluation team), and how ATT&CK Evals can be used to help select an endpoint security solution that aligns with your specific needs.

This is to say that we’re not going to assess everything this round of ATT&CK Evals covers because, quite frankly, that would be impossible. Not only do ATT&CK Evals yield a ton of information, but there’s no singular way for interpreting their results; there are no scores, rankings, or ratings, and no vendor is declared a “winner.”

There is nuance in the ways each vendor’s tool works and how effectively it presents information to the analyst using it, but needs and individual preferences play as important a role in determining which endpoint security tool is best for you and your team as any other factor. If you’ve heard gamers debate which console reigns supreme between PlayStation and Xbox, then you know what we mean (hint: the correct answer is Nintendo).

How did Sophos perform in the Round 5 MITRE Engenuity ATT&CK Evaluations?

This round of ATT&CK Evaluations focused on emulating adversary behavior associated with Russia-based threat group Turla.

Similar to previous rounds, MITRE Engenuity executed multiple attack scenarios throughout the course of the evaluation.

Attack Scenario 1: “Carbon”
The first day of testing, titled “Carbon,” consisted of a multi-layer attack campaign targeting both Windows and Linux infrastructure via the deployment of Turla-specific malware, including Epic, a backdoor commonly used during the initial stages of Turla’s attacks, Carbon, a second-stage backdoor and framework used to steal sensitive information from victims, and Penquin, a remote access trojan (RAT).

Attack Scenario 2: “Snake”
The day two scenario, titled “Snake” emulated an attack on a hypothetical organization focusing on kernel and Microsoft Exchange exploitation that once again leveraged Epic, as well as Snake, a tool used for long-term intelligence collection on sensitive targets and considered one of the most sophisticated cyber espionage tools currently in use, and LightNeuron, a sophisticated backdoor used to target Microsoft Exchange servers.

Sophos Evaluation Results
With the “Carbon” attack scenario consisting of 76 substeps and “Snake” consisting of 67, the ATT&CK Evals team executed a total of 143 attack substeps during the evaluation.

Sophos Intercept X Results:

99% Total Detection Coverage (141 of 143 attack substeps)
98% Total Analytic Coverage (140 of 143 attack substeps)
99% Analytic Coverage for “Carbon” (75 of 76 substeps)
97% Analytic Coverage for “Snake” (65 of 67 substeps)

You can see a complete view of our results on the MITRE Engenuity results page for Sophos.

How did Sophos’ results compare to other participants?

We will reiterate once more that there’s no singular way for interpreting the results of MITRE Engenuity ATT&CK Evaluations. And, over the coming days and weeks, you are going to see countless vendor-created charts, graphs, and other visualizations that each frame the results in different ways (some more credibly than others).

That said, one of the most common ways to view ATT&CK Evaluation results at a macro level is by comparing Visibility (the total number of substeps that generated a detection) and Analytic Coverage (the total number of detections that provided rich detail on the adversary’s behaviors):

MITRE ATT&CK detection categories explained

This year, the ATT&CK Evals team completely overhauled how participant results are displayed in the evaluation portal, making it easier than ever to see detection categories for every attack scenario step and substep.

Detection quality is critical for giving analysts detail on the adversary’s behavior so investigations and response actions can be executed quickly and efficiently.

Detection categories include:

Not applicable – there was no visibility (typically used in situations where the participant opted out or could not complete that portion of the evaluation)
None – Nothing was detected; a “miss”
Telemetry – Something happened but not sure what; no context provided
General – An abnormal event was detected but there’s no context on why or how; the “WHAT”
Tactic – The detection includes info on the attacker’s potential intent; the “WHY”
Technique – The detection includes info on the attacker’s method for achieving a goal; the “HOW”

Detections classified as General, Tactic, or Technique are grouped under the definition of “Analytic Coverage,” which is a measure of the EDR tool’s ability to convert telemetry into actionable threat detections.

How to use MITRE Engenuity ATT&CK Evaluation Results

ATT&CK Evaluations are among the world’s most respected independent security tests due in large part to the thoughtful construction of real-world attack scenarios, transparency of results, and richness of participant information. When considering an EDR or Extended Detection and Response (XDR) solution, ATT&CK Evaluation results should undoubtedly be input alongside other third-party proof points, including verified customer reviews, and analyst evaluations.

As you cull through the data available in MITRE Engenuity’s evaluation portal, look beyond the numbers and consider the following as it pertains to you, your team, and your organization. And keep in mind that there are some questions that the ATT&CK Evaluation cannot help you answer.

Does the tool help you identify threats?
Does it present information to you the way you want it?
Who will be using the tool? Tier 3 analysts? IT specialists or Sysadmins?
How does the tool enable you to conduct threat hunts?
Are disparate events correlated? Is that done automatically, or do you need to do that on your own?
Can the EDR/XDR tool integrate with other technology in your environment (e.g., firewall, email, cloud, identity, network, etc.)
Are you planning to use the tool by yourself, or will you have the support of a Managed Detection and Response (MDR) partner?

Why we participate

As a closing note, we wanted to say how proud we are to participate in this MITRE Engenuity ATT&CK Evaluation alongside some of the best security vendors in the industry. Yes, we compete with one another on the commercial side of our business, but we are—most importantly—a community united against a common enemy. We participate in these evaluations because they make us better, individually and as a collective. And that is a win for the entire industry and the organizations we defend.