Attribution – Not for the Faint of Heart

The Stark Reality 

High-profile cyber attacks have become the new normal. Personal information for sometimes millions of users – including Social Security numbers, credit card numbers, and passwords – is exposed in these attacks, resulting in a wide variety of impacts, such as the minor inconvenience of changing a password, identify theft, and massive decreases in company stock value.  Out of a sample size of thousands of announced breaches, attack after attack after attack make one thing clear: the question is not if there will be an attack, but when it will occur. 

Attribution – What Is It?  And Who Cares? 

The Office of the Director of National Intelligence commissioned a Public/Private Analytic Exchange Program (AEP) to study the challenges of cyber attribution, which is an excellent read that details some of the complexities for the stakeholders in the attribution determination.  In the document linked above, the AEP defines attribution as “the process of building a story describing how an attacker has managed to infiltrate an organization’s infrastructure, hacked a website, or performed some other destructive / malicious act.”  Highly-skilled incident responders are charged with building this story when responding to a cyber attack, but it’s far from an exact science, for many reasons.   The Internet can provide some abstraction between the cyber actor and the cyber attack itself, leaving the incident responder with a complex, difficult puzzle to solve.   

President Trump shares a view held by some that is rooted in part by the complexity of attribution.  In his comments in a FOX News interview on December 11, 2016, he said:  

“Once they hack, if you don’t catch them in the act you’re not going to catch them. Intelligence agencies have no idea if it’s Russia or China or somebody.  It could be somebody sitting in a bed some place… I don’t really think it is the Russians, but who knows?  I don’t know either.  They don’t know and I don’t know.”   

It’s true – there’s rarely a “silver bullet” or “smoking gun” when it comes to attribution.  But for the technical community, incident response is a fascinating and challenging field where “we don’t know” is an invitation to discover the unknown.   

Attribution has multiple facets comprised of answering numerous questions, some of which are noted by the AEP as follows: What is the attack?  Who made the attack tools (aka, malware)?  And when were the tools made or released/configured?  How did we find the attack?  Where can we find more attacks or tools like it?  More questions could be added to the AEP’s list, such as: how does the attacker operate?  What kind of communications protocols do the attack tools, or the attackers, use?  What encryption algorithms are in use by the attacker’s tools?  How does the malware receive tasking, what tasking is possible, and what tasking is being exercised?  What do we know about the IP address(es) involved, especially if the addresses participated in previously discovered attacks?  All of these questions are part of attribution, and when more of these questions are answered – and ideally within the context of a knowledge base informed by past and present incidents – the clearer the attribution becomes.  

This challenge and complexity of attribution, however, doesn’t diminish its importance in any way.  The AEP found that there are:   

“At least 3 distinct communities that are impacted by the issue of attribution: 

1) Law Enforcement / Operational Investigations with arrest authorities who are interested in prosecution of the attributed party as the outcome
2) US Intelligence Community (IC) (Title 50 Community) with operational aspects and investigations under the Espionage Act who are interested in Economic, Military, Political policy changes as the outcome  
3) Industry / Commercial-Private Sector Business community who are impacted by attribution from an economic perspective, with business decisions and remediation efforts to maintain business operations.” 

In short, who you are determines why attribution matters.  The community that is most publicly impacted by attribution is the Industry / Commercial-Private Sector Business community, but the three distinct communities detailed by the AEP above often work in concert when there is an attack, such as the Sony hack in 2014.  The variety of motivations for attribution that these communities bring to the table – particularly in a multi-faceted, high-profile cyber attack incident response – only underlines the importance of attribution.    

Why Is Attribution So Difficult? 

A cyber attack is a multi-faceted event that is often carefully executed, well thought-out, and has layers of abstraction and complexity to minimize the chance of a definitive attribution determination.  There are numerous components to a cyber attack that make attribution difficult, some of which are briefly surveyed below. 

Incident Response Scope 

 Thomas Rid and Ben Buchanan wrote “Attributing Cyber Attacks”, which is a detailed paper that introduces the “Q Model” meant for use in a cyber attack investigation.  They identify the broad scope involved in an incident response, stating: 

“Attribution is almost always too large and too complex for any single person to handle; attribution is likely to require a division of labour, with specialities and sub-specialities throughout; and attribution proceeds incrementally on different levels, immediate technical collection of evidence, follow-up investigations and analysis, and then legal proceedings and making a case against competing evidence in front of a decision authority. 

Rabbit holes can lead to other rabbit holes, as more and more of the puzzle unfolds.  As more and more clues are discovered, the scope of the attack can increase tremendously, and initial estimates of incident response costs increase dramatically.  In many instances, discovering a clue only leads to more questions, rather than answers.   Do we know which records were stolen?  If so, what motivated them to exfiltrate (steal) this specific data, instead of other data?  How did they gain access?   Is this the first time they gained access, or are we seeing the latest campaign?  Which accounts were compromised?  How did the attackers gain initial access, and how did they gain their current broader level of access?  How did they move laterally on our network?  Were our systems fully patched during the intrusion?  Before you know it, the initial detection of anomalous activity on an unprivileged user account has led to the discovery that the entire Active Directory server was compromised.  The goal of attribution is to develop the story of how the cyber attack unfolded – as well as who carried it out and why – and if that story has far more twists and turns than originally thought, attribution can be much more difficult to determine, potentially decreasing the confidence in attribution determination as the scope increases. 

Attacker Infrastructure 

In “Untangling Attribution”, David D Clark and Susan Landau write:  

“There have been calls for a stronger form of personal identification that can be observed in the network.  A technically nonsensical but nonetheless clear complaint might be: “Why don’t packets have license plates?”   

The Internet was never designed for attribution.  Just because all your data is being exfiltrated to an IP that’s owned by (for example) a midwestern U.S. university doesn’t mean that the university is responsible for the attack.  Sophisticated cyber actors use disassociated and / or unattributable infrastructure that can include multiple VPNs, VPSs, front companies, hop points, Tor, and exit nodes, any of which can change and update at the actor’s discretion.  Discovery of any one of these infrastructure components requires additional investigation – and potentially elevated authorities in a “follow the money” scenario, for example – to re-trace the actor’s steps.  This “chain of attribution”, as Clark and Landau put it – in addition to the cyber actor’s use of steganography, obfuscation, and encryption – makes it even more difficult for the story to be told, making attribution murkier.   

Attacker Tradecraft 

Malware – especially nation-state malware – can be extremely complicated and can take months to reverse engineer.  Obfuscation, encryption, and false clues can sidetrack an incident response for potentially months at a time.  Raytheon recalled an incident response that highlights the complexity of the analysis at this stage in the attack lifecycle: 

“When investigators began dissecting a cyberattack that infiltrated U.S. and European power companies, they found something curious: strings of code with words in both Russian and French.  But this wasn’t some sloppy mistake by shadowy hackers — in fact, the experts concluded, the attackers probably coded in two languages on purpose.  Leaving false clues is one of the many ways hackers can conceal their identities. They also spoof IP addresses, switch toolkits and use other techniques to confuse the analysts who are trying to track their tradecraft. All that makes attribution… a complex and costly affair.”  

Analytic tools, incident response expertise, and more data (system logs, malware samples, etc.) are just some of the ways that attacker tradecraft can be more fully characterized, but attackers try to leave as few breadcrumbs as possible.  This Bloomberg article, which had access to some of FireEye’s post-breach analysis of the 2016 Bangledesh Bank incident, included the following notes concern attackers deleting or altering forensic logs:  

“The report cast the unidentified hackers as a sophisticated group who sought to cover their tracks by deleting computer logs as they went. Before making transfers they sneaked through the network, inserting software that would allow re-entry.” 

As attacker tradecraft changes, attribution gets more difficult. But that’s part of the challenge; cyber attacks are highly dynamic events. 

In Conclusion: How Can Attribution Efforts Improve? 

Many researchers are closely studying attribution, and that is perhaps our greatest chance of improving the ability with which attribution is determined.  By having more and more talented security researchers (in academia and private industry and government) collaborating and sharing information about threat groups, incidents, and TTPs, attribution will continue to mature.  This maturity, however, must also be malleable; cyber actors will continue to adjust their TTPs, and attribution efforts must be flexible in order to account for these adjustments. 

Georgia Tech is one of many universities researching attribution, and they have a promising approach that utilizes machine learning and large-scale datasets.  They are: 

“working on large-scale collections of malware samples to posit relationships between binaries.  With a sufficient sample size, machine-learning techniques can be applied. Current research builds upon the community’s state-of-the-art approach to attribution, in which code stylometry looks at stylistic features (i.e., white spaces, operators, literals, etc.) and author-created attributes (i.e., average number of characters per word, character count, use of special characters, punctuation, etc.).  Our aim is produce credible links between a binary and a given set of binaries from the same cyber threat actor in a measurable way.  We focus on the following domains to derive attribution inferences, and require multiple positive correlations between domains to produce results: string constants, implementation traits, custom features, and infrastructure” 

Larger data sets, machine learning, and sharing information between government, private, and research institutions is the primary way in which attribution methods will mature.  This information can feed analytic tools that can be quickly used by analysts and incident responders to unmask the unknown and help customers get the conclusions they need.   

But in order truly improve attribution for an organization – whether private or public – a highly-scalable, on-network solution provides unparalleled insight.  

Information is, by far, the largest enabler when discovering attack attribution.  

Cyber Crucible’s patented technologies, providing fast, unparalleled insight into attack details, and attacker tradecraft, is the differentiator necessary to effectively defend from attacks.  We know attackers will find ways into our networks, so the goal needs to, beyond making the attack as difficult as possible, analyzing, then removing the intrusion as fast as possible, with as much detail as possible.  Only Cyber Crucible enables targets of cyber attack to tell the attribution story, and tell it quickly enough to be meaningful to business operations. 

For Further Reading 

Attribution is a very active research area.  Here are some resources that dive into attribution in far greater detail than discussed here.