Application, operating system, and device logs contain essential security information, but many organizations struggle to collect and analyze them. Smaller businesses are particularly vulnerable because they often do not have dedicated security resources and centralized log collection is perceived as unnecessarily expensive.
From a security perspective, log data can be used retrospectively for investigations and proactively analyzed to detect and respond to security incidents. Both use cases benefit significantly from centralized log collection.
Cybersecurity incidents are often detected after significant damage has been done. For retailers, the first sign of a payment card breach may be notification by law enforcement or their acquiring bank. Victims may learn of incidents when they receive larger than expected bills for cloud computing or VoIP services, when confidential information is published to the Internet, or on receipt of a blackmail attempt. During compromises, intruders often delete system logs. While the potential for forensic recovery exists, real-time centralized log collection is often key to understanding what happened.
Decades ago, security literature advised administrators to review logs daily, but the volume of logs created by even a single web server makes this infeasible. Logs contain valuable information that can be used to detect and respond to threats, but this requires the use of analytic techniques. In some cases, simple analytics will reveal unusual behaviour that warrants further investigation. In other cases, more sophisticated correlation is required.
Many products exist to address log collection and analysis. They can be broadly divided into two categories: Log Management (LM) and Security Information and Event Management (SIEM). LM products such as Splunk and Graylog focus more broadly on log aggregation and search capabilities. SIEM products focus more tightly on event correlation and security analytics.
Splunk is the Cadillac of LM. The product will ingest logs in virtually any format and allow free-form searching. Agents to collect logs from Windows, Linux, and other operating systems are included. This provides broad value to IT. Splunk also offers SIEM-like capabilities. These are generally realized using scheduled or real-time searches. For example, a Splunk query might be written to detect failed login attempts and notify security operations when a threshold is exceeded. The downside of this approach is that each real-time search runs in parallel, consuming significant CPU resources. Spunk is licenced by ingested data volume, making it an expensive product.
Graylog, the leading open-source LM solution, takes a different approach. The product does not provide collection agents, but logs can be ingested in several formats, including syslog and Graylog Extended Log Format (GELF). Received data is written to the database to facilitate future searches, and is matched against various criteria for entry into streams. This approach is particularly powerful because criteria can be applied to each log entry as it arrives instead of requiring repeated searches. Streams can be used to tag relevant log entries and optionally forward them to another system for further analysis. Alerts can be configured on stream data as well, but are somewhat limited. For example, an alert can be raised when the count of stream events exceeds a threshold, but not, for example, limited to when a single IP address exceeds a threshold count. Graylog does not natively provide SIEM capabilities. However, Graylog’s extensible, open-source framework enables the creation of SIEM functionality as well as pulling in additional data to augment logs as they are received.
HP ArcSight is the best known product in the SIEM space. Unlike LM products, ArcSight requires much more structure when ingesting logs. This is accomplished with the Smart Connector approach — a lightweight module must be able to parse and understand logs as received. If a connector does not exist, it must be created so that logs can be ingested and normalized. ArcSight is designed specifically to correlate log data and provide security analysis. Using the product involves a significant learning curve, but it remains popular in security operation centers.
SIEM products can be very useful if organizations can afford the licensing, training, and employee time required to operate the system. Outsourcing to a managed security provider is another option. The debate for many companies is whether to collect data directly into a SIEM, or collect into an LM and then forward data to a SIEM. From a security operations perspective, direct collection into a SIEM eliminates the cost of the LM solution while still providing the data SOCs desire. However, this approach does not provide the greater value to IT that can be realized with an LM solution.
Small organizations without a dedicated security team may find the cost and complexity of SIEM solutions may make deploying an organization-wide LM solution a better choice. SIEM capabilities can subsequently be outsourced, purchased, or built using API capabilities of the LM product.