Exec Blog

Avoiding incident response Groundhog Day

Ever since I began my information security career, February 2nd has held a special place in my heart. When that date arrives I can’t help but think of the movie “Groundhog Day” with Bill Murray, and if you’ve spent any reasonable stretch of time doing incident response, then that reference has caused a grim smile to cross your face.

I started my sysadmin career in a university environment, and faced my first intrusion within two weeks of starting the job. The distributed and autonomous nature of campuses ensured a steady “business” for incident response, and I split my time patching, tuning, and responding to intrusions. But nothing could have prepared me for the Groundhog Day that was incident response while I worked at US-CERT for nearly seven years. We responded to incidents at federal agencies and a wide array of critical infrastructure companies spanning finance, IT, manufacturing, and oil & gas. While no two cases were exactly the same, our high-level approach to IR, along with the challenges encountered by the majority of victims, were astoundingly similar.

At the start of every incident response engagement, one of my teams would sit down with a wide array of stakeholders from the victim organization. The initial meeting was as much about level-setting expectations as it was to learn about their environment and the incident; leadership for many victims hoped they would receive a clean bill of health at the conclusion of the engagement. Imagine their surprise when we told them there were no guarantees with incident response and remediation! We would inform their leadership that we had three broad goals:

  • Improve the ability to detect intrusion activity inside the enterprise (gain visibility),
  • Reduce the time needed to scope and respond to detected intrusion activity, and
  • Build and execute a remediation plan to eliminate the current threat, mitigate the risk of future intrusions when possible, and limit damage when preventing a future intrusion isn’t possible.

Each of these goals share a common underlying theme: threats aren’t going away, but organizations can take discrete steps to improve their own agility and reduce attack surface.


Gaps in network and endpoint visibility are common pain-points for organizations coping with an incident, and this is especially true when introducing a third-party for assistance. Upon arriving onsite at the victim, my teams would typically spend up to a week assessing the “lay of the land” to place network sensors, collect logs, and analyze servers and workstations.

In support of the latter task, we often deployed an agent-based, commercial incident response product – but it only supported Windows systems. Complicating matters further, we could never ensure that we had complete coverage of even just the Windows systems in the environment. In at least one case, we discovered that an organization had missed an entire segment of endpoint systems – several months after the intrusion cleanup was ostensibly completed! Imagine spending weeks on an investigation and response effort, working to uncover all of an attacker’s tracks during what might amount to an existential crisis for the victim organization – only to realize that you had gaping blind spots despite best efforts to the contrary?

Rapid Response and Scoping

Many of my former incident investigations were encumbered by technology, not people or process. The legacy endpoint IR tool we relied upon required multiple servers to support large environments. Sweeping the environment for IOCs took days or weeks. Attempts to perform at-scale analysis beyond simple IOC searching – such as frequency analysis to spot outliers – required a cumbersome data export and manual post-processing.

Thinking back on where we succeeded and struggled, several capabilities could have reduced the time and effort required to successfully respond to incidents:

  • Endpoint visibility across all major operating systems (Windows, Mac, Linux, other flavors of Unix) from a single console. This alleviates the need for one-off tools or processes for investigating certain platforms.
  • Ability to detect unmanaged systems that could be a hidden bastion for intruders. Organizations often struggle to provide this data in an accurate or efficient manner.
  • Fast, cross-platform, and resource-efficient IOC detection to enable more frequent and more comprehensive use of structured threat data. IOC searches against a narrow window of events – or that take days or weeks to complete – are doomed to miss findings.
  • Access to endpoint data that goes beyond forensic telemetry (“Who’s logged in via RDP right now?” “What systems contain this filename or hash, anywhere on disk?” “What Tomcat servers have this vulnerable setting?”) – particularly for systems where an endpoint client is installed post-breach.
  • Flexibility to run ad-hoc searches across the enterprise for any scope of data – current, at-rest, or historical, and get immediate responses.

The latter point is particularly important. Hunting teams often need to test ideas and try out new analytic techniques to determine if they can truly identify anomalies. The longer each query, data collection, and analysis process takes end-to-end – the more cumbersome this process becomes, thereby limiting the team’s ability to innovate, and forcing reliance on stale, limited sets of data. Investigations take longer to complete, and remediation gets pushed further and further out – and less likely to succeed if a key finding is missed. Lather, rinse, repeat. Sounds like Groundhog Day to me.

Remediation, Hygiene, and Lessons Learned

One of my frustrations as an incident responder was to witness organizations struggle with the same basic security weaknesses, and make the same mistakes, over and over again. It’s easy to become cynical or blame the victim – especially when you encounter a re-compromised organization that has yet to implement the most basic cyber-hygiene controls or address of the targeted recommendations you delivered at the conclusion of their previous incident.

Organizations undergoing a major breach response effort can take advantage of significant “interia” (inclusive of management attention and resources) to resolve these underlying weaknesses. Why, then, do they often fail to succeed? A major factor is the ease with which an operations team can implement the prescribed remediation controls and steps for attack surface reduction. Refer, for example, to the Australian Signals Directorate’s excellent reference on mitigations[1]: they assessed the difficulty of implementing the top 4 mitigations as “medium” or “high”. Reducing this level-of-effort is critical; the traditional approach to manual incident remediation and high-maintenance security controls can quickly arrest the momentum of a breach response and its support. This sets the organization up to backslide to a reduced state of cyber hygiene, and thereby susceptible to re-compromise.

Ben Tomhave[2] aptly articulated this point during his RSA conference talk, “Automate or Die! How to Scale and Evolve to Fix Our Broken Industry.[3]” Proper automation or simplification of key workflows reduces overall cost, and removes human factors that are likely to result in deterioration or failure.

Breaking Free of Groundhog Day

Today’s seemingly endless series of high-profile breaches has made it clear that the status quo is failing: innovation in point solutions cannot solve the problem, nor can heavily silo’d IT and security operations teams. While threat actors and their tools may continue to evolve, organizations continue to struggle with the same basic security challenges – from protection through response – as a decade ago. We must set aside the “always done it that way” mentality. In “Groundhog Day”, Bill Murray’s character spent the early portion of his groundhog days repeating the same day over and over, never changing a thing. Only when he realized that he had an opportunity to try something different, did he eventually break free of a seemingly endless cycle.

Chris Hallenbeck, Director

[1] http://www.asd.gov.au/infosec/top-mitigations/mitigations-2014-table.htm

[2] http://twitter.com/falconsview

[3] http://www.rsaconference.com/events/us15/agenda/sessions/2013/automate-or-die-how-to-scale-and-evolve-to-fix-our

Like what you see? Click here and sign up to receive the latest Tanium news and learn about our upcoming events.

Featured Webinars

Upcoming Events

Contact Sales

Press Inquiries

Contact Us

Thank you for contacting us

Back to the Tanium Home Page