Rethinking the Investigation Phase of the Endpoint Security Lifecycle: A Closer Look at Tanium Trace
A frequently-cited metric when examining the current state of incident detection and response is “dwell time.” Research consistently highlights a significant gap — often measured in months — between the time at which a compromise occurs and when the victim ultimately detects the intrusion.
We can all agree this gap serves as a useful barometer for measuring the success of a security program and that shrinking this timeframe is critical to reducing the potential impact of attacks on business and infrastructure. But first we must redefine the timeframe of “dwell time”, extending its boundaries beyond the point at which an incident is detected. In fact, dwell time encompasses the entire course of the subsequent investigation, ultimately lasting until a successful remediation event.
What defines a successful investigation?
An investigation must identify the actual business impact of the compromise and “tell the story” of what happened. Investigators begin with identifying all of the affected systems and devices across an environment and determining the extent of attacker activity on each. However, the importance of scoping goes beyond building a tally of unique malware samples, infected hosts or stolen credentials. The findings from an investigation are essential inputs to short and long-term remediation plan, which ideally should be developed in parallel to the investigation from day one. Missing a single infected system, backdoor command-and-control address or compromised set of user credentials could potentially allow an attacker to easily regain access and nullify the entire remediation effort. Likewise, failing to identify and address more fundamental vulnerabilities exploited during an incident leaves a victim with no net improvement to their security posture.
Unfortunately, many organizations struggle to succeed at this critical stage of the incident lifecycle. As a consultant, I spent six years leading investigation and remediation efforts for organizations that required external assistance, especially when expected to execute within a short timeframe. So much rides on an investigation, but the road to remediation remains difficult path. With so much on the line, why do investigators still struggle? The answer typically lies within the classic domains of people, process, and technology:
- People: Computer forensics requires significant technical skills and training — particularly when overlaid with the need to understand enterprise networks, common technologies like Active Directory and the patterns and characteristics of common intrusion techniques. These skills have, and will likely remain, in short supply for the foreseeable future.
- Process: Many organizations never establish defined procedures for handling a targeted intrusion until they’ve encountered one.
- Technology: The vast majority of tools used to conduct investigations, particularly endpoint “hunting” and forensics, are slow, overly complex and unable to scale in large, distributed environments (or keep pace with an active intrusion). Furthermore, many products are borne from point solutions intended to only address a small subset of incident detection or response tasks, and therefore rapidly degrade when retrofitted into additional IR use-cases at enterprise-scale.
I’m passionate about the opportunity to equip businesses with technology that can serve as a force-multiplier, amplifying the skills and resources of existing security teams to address attacks of any scale. Therefore, I’m excited to share Tanium’s latest capability, Tanium Trace, and to empower Security and Incident Response teams to drastically reduce the time and effort to detect, investigate and respond to intrusions.
A closer look at Tanium Trace
Tanium Trace (Trace) joins a number of existing incident detection and response capabilities already included with the Tanium Endpoint Platform, including the ability to rapidly query and collect data on the “current state” of all endpoints, to search for structured Indicators of Compromise, and to acquire larger forensic artifacts, such as a memory image or dump of file system metadata, from targeted systems. Trace builds upon these capabilities with three key features: the Trace Collector, the Trace Workbench and Trace Sensors.
Trace Collector
Trace Collector is responsible for continuously recording key forensic evidence on each endpoint. It works by monitoring the Windows kernel and other low-level subsystems to capture a variety of events, including: process execution, file system activity, registry changes, network connections, driver loads and user authentication. This data can accumulate quickly — even an idle system generates constant activity — so Trace intelligently coalesces similar events and efficiently stores them in a local database. As a result of these optimizations, the default configuration can retain up to several weeks of historical data. Tanium administrators can further customize the amount of local storage consumed by Trace, as well as filter the types of recorded evidence, to fit desired use-cases.
By means of comparison, traditional disk and memory forensics techniques can successfully reconstruct fragments of endpoint activity, but are limited to the evidence that is natively preserved by underlying operating system — evidence that can rapidly degrade as time elapses from a period of interest. Trace maintains a complete, easy-to-interpret history of events that allows analysts to rewind the clock and replay what recently occurred on a system.
Furthermore, Trace enriches each of the recorded events with additional context and metadata. For example, process execution events include the full command line, hash, parent process command line and user context. File, network and registry events are all recorded with the associated process and user context in which they occurred. Security events, such as logons and logoffs, are retained independently of event logs that might have rolled. This makes it incredibly easy to take an investigative lead and quickly ascertain the context in which it occurred on an endpoint.
Trace Workbench
Trace Workbench is a web-based user interface directly integrated into the Tanium console. It provides the ability to connect to remote systems and conduct deep-dive analysis against the rich data set captured by Trace Collector. All analysis can be conducted through a live connection within seconds, with no time-consuming data transfers or parsing required; users can also preserve snapshots of a system’s Trace database for storage and collaborative analysis on the Tanium server. It also provides the ability to easily reconfigure Trace settings, such as the maximum database size and retention history, across all endpoints.
Users can begin by searching a system’s entire Trace database based on keywords or an initial time range. Search results are presented in a traditional grid that can be re-sorted and filtered on a per-column basis to help users eliminate extraneous data.
Double-clicking any search result navigates the user to a detailed Process Overview page for the process responsible for the selected event. This essentially pivots the analysis to the scope of a single process over time, rather than the scope of all events matching the initial search criteria. The Process Overview displays all of the file, network, registry, and child process activity initiated by the currently examined process in both a visual timeline and a grid. This page also provides the full process imagepath and arguments, user context, hash, and parent command line. Users can also download the executable for further analysis. Users can also navigate an interactive tree-view that displays the current process’ parent, children, and peer nodes.
Trace Sensors
It’s easy to get tunnel-vision when analyzing a single system that’s part of a broader incident. Investigators need to be able to search across the enterprise for findings and artifacts uncovered during deep-dive forensics. Ideally, every ad-hoc search and test case shouldn’t entail the overhead of constructing and deploying IOCs. Queries for simple artifacts, such as a process, registry key or file, should be lightweight and yield immediate results. This isn’t just to search for “known-bad”, but to help recognize and evaluate “known-good.” Anyone who’s done forensic analysis knows the feeling of encountering a set of evidence that appears suspicious, but may also be an artifact of normal system behavior concurrent to an intruder’s actions. The ability to instantly evaluate “How common is this ‘thing’” across an environment can radically reduce the effort required to examine a system, and to build more accurate, resilient indicators for future use.
Trace enables this capability through sensors that power Tanium’s core “Ask a Question” feature. All of the data captured by Trace on every endpoint can be searched at scale with the same 15-second response time that any other Tanium question would produce. Likewise, the same Trace data can be retrieved via recurring searches or a SOAP API, and results can be saved, aggregated over time, or redirected to other security analysis or storage solutions like SIEM via Tanium Connect.
When examining process details within Trace Workbench, any row of activity can be transformed into a context-sensitive search with a single click. For example, selecting a row for a CreateProcess operation prompts to search for a matching process by path, MD5 hash, or full command-line; selecting a file or registry row prompts to search by the operation type and item path. Users can also manually craft more elaborate queries that leverage the full Trace data set and provide advanced options like time range constraints and regular expression matching.
Trace sensors — particularly when used to support recurring, saved searches — can also provide an effective mechanism for proactively monitoring an environment for anomalous activity. Some examples of Trace searches that users can easily build, customize, and integrate with other security solutions include:
- Type 3 (Network) and Type 10 (RemoteInteractive) logons by a selected set of accounts, source systems, and / or target systems
- File creation events using regular expression matching to watch for activity in specific directories (such as common attacker staging locations) or for the creation of specific file types (such as RAR archives)
- Creation or modification of registry keys and values frequently used to persist malware, such as Windows Service ServiceDLL parameters or values under “Run” keys
- Unique combinations of parent and child processes that may be indicative of command-and-control activity, such as a command shell being spawned by an atypical parent like “svchost.exe”
- Execution of native Windows commands infrequently used by ordinary end-users but often leveraged by attackers for reconnaissance and lateral movement, such as “net.exe”, “at.exe”, and “wmic.exe”
- Anomalous network flows, such as attempted outbound connections to Internet HTTP/s servers from isolated internal domain controllers, or SMB or RDP traffic to privileged administrator workstations or key servers from atypical source subnets.
With the addition of Trace, the Tanium Endpoint Platform provides a truly comprehensive solution to the entire incident response lifecycle — and one that can remain fast, reliable, and easy to manage at any scale. Trace can be deployed in minutes using the Tanium platform, and adding all of these Trace features to an existing deployment requires no additional hardware. With a few clicks, users can install Trace to a subset of endpoints or the entire environment, monitor installation status, and begin using the Trace workbench and sensors.
RYAN KAZANCIYAN, Former Chief Security Architect
Interested in seeing Tanium in action? Schedule a one-to-one demo or attend our weekly webinar. Talk to our Tanium experts at our upcoming events.