Intrusion Detection for Cyber-Physical Systems
Even if the known threats, risk factors and other security metrics are well understood and effectively mitigated, a determined adversary will have non-negligible probability of successful penetration or intrusion of a CPS. Here we use the term “intrusion detection” to refer to a broad range of processes and effects associated with the presence and actions of malicious software and actions against a CPS. Once an intrusion has occurred, the first and necessary step for defeat and remediation is to detect the existence of the intrusion.
In Colbert & Hutchinson (2016), we describe the history of intrusion detection in IT and CPS systems and discuss various methods and Intrusion Detection Systems (IDS). These authors discuss the difficult question of whether insights and approaches intended for information and communications technology (IT) systems can be adapted for CPSs. To answer this question, they explore modern intrusion detection techniques in IT such as host-based techniques and network-based techniques, and the differences and relative advantages of signature-based and non-signature methods.
After approximately 2010, CPS intrusion detection techniques began to focus on knowledge about the processes controlled by the CPSs rather than on direct detection or inference of the malware on the network. The design intent of a CPS is intended to (1) establish appropriate process values to produce desired output and (2) to allow operators to observe aspects of the plant to assure proper operation and safety and quality conditions. The sole purpose and only capability of CPS network traffic control messages is to support the synchronization of the PLC registers and to provide a local, HMI-side copy of these registers, to effect control of the plant processes. IT network traffic has a much wider variety of uses, but is not generally used for process control. While both CPS and IT computers have registers, only a CPS network can change and read register values. Register values directly affect process parameters and hence, the process. Since CPS security is ultimately for safeguarding the process variables and not the network traffic itself, process-oriented designs for monitoring and intrusion detection became of interest.
For example, Hadziosmanovic et al. (2013) attempted to model process variable excursions beyond their appropriate ranges using machine-learning techniques. These authors describe a novel network monitoring approach that utilizes process semantics by (1) extracting the value of process variables from network traffic, (2) characterizing types of variables based on the behavior of time series, and (3) modeling and monitoring the regularity of variable values over time. Approximately 98% of the process control variables used in real-world plans are reliably monitored by their process-oriented method. The remaining 2% of the variables remain challenging to model with this approach. This novel approach demonstrates that process variables can successfully be modeled for ID. However, as they mention, additional work is needed if all of the process variables are to be monitored reliably. Semantic modeling of plant control variables in the control system process became a favorable and presumably effective intrusion detection method for CPS.
Semantic Security Monitoring (SSM) by Hadziomanovic et al. (2013) used analysis of control-bus traffic messages to construct a 3rd copy of the plant-PLC registers for a new purpose: to detect events that suggest that plant operations may be out of specification, out of compliance, or out of a desired safety range. An important caveat of using network data to construct a security model is that the network control messages were never intended for security monitoring. The rates and precision of the information in the control messages are designed to be sufficient to accomplish control to maintain quality output, but they may not be appropriate or sufficient for security and safety monitoring operations.
Figure 2: Three layers of a Cyber Physical System
A second method of semantic modeling, developed at the US Army Research Laboratory, was proposed by Colbert et. al (2016). This method requires plant personnel input to define critical process variable limits instead of inferred input to the security model from network control traffic.
One can view the CPS as a three-layer system to better understand our process-oriented intrusion detection method. As mentioned, CPSs inherently have physical and cyber layers, in which physical machinery and attackers and defenders operate, respectively (see Figure 2). In our model, intrusion detection occurs on a third layer (the “process” layer), in which the system operator and system owner operate. A process diagram, plant policies and procedures, and continuous system monitoring by the system operator determine the critical elements and requirements needed to keep the system operational.
Our CPS intrusion detection research at the Army Research Laboratory (ARL) is based on the assumption that all of the process variables do not need to be monitored for alerting. Rather, there are critical process variables (or, more generically, critical elements of the process) that need to be monitored for alerting. Abnormal values of the remaining variables are not significant enough to harm the underlying plant process. We argue that identifying the critical values and determining the allowed ranges of those critical values is extremely difficult if only network traffic data is used. We use a collaborative approach to constructing the security model which requires plant operator or plant SME input and potentially out-of-band (OOB) sensor data in addition to data from network packets.
Our model recognizes that, just as in IT intrusion detection, reference information from plant sensors, configurations, semantics, and policies (acceptable security/safety value ranges) must be captured, maintained, shared, and made available to the security/safety monitoring analysts in timely, orderly, and priority-relevant means to enhance decision-making. However, it also recognizes that CPS process sampling methods and process control methods (e.g. MODBUS) were never intended to feed security/safety analyses. Thus, as stated earlier, many process parameters seen in network traffic may not be relevant, or may not be sampled at sufficient rate or fidelity. Moreover, there may be other process variables that are indeed critical, but they are not represented in network traffic, i.e. they are out of band. In this case, independent sensing of these parameters would be needed to create sufficient uplift in timeliness, accuracy, and relevance to the security/safety monitoring mission. In the ARL model, the SME defines the critical security model variables based on his knowledge and analysis of the plant processes, and the IDS security engineer implements the appropriate security model. We refer to this model as “collaborative” since the security engineer utilizes human input from the plant operator/SME input for constructing the IDS security model.
Our ARL intrusion detection development platform (e.g. see Long 2004) defines ‘alerting’ as automatic information generation to be sent to a human analyst for further consideration. The analyst then examines the alerts and other relevant information and determines when to send an ‘alarm’ to the system owner. An alarm is a notice of a possible compromise or other insecure situation, as determined by the human analyst, whereas an alert is automatically generated information from a sensor or algorithm.
Our collaborative intrusion detection model was implemented in the ARL intrusion detection development platform in a live testbed at ARL. General findings from our testbed experiments are described in Sullivan & Colbert (2016) and Sullivan, Colbert & Kott (2016). In Figure 3, we show the implemented IDS architecture in our testbed. A network tap (e.g. SPAN port on a switch) provides network capture data to one or more sensor nodes. Some of the data are pre-processed on the sensor nodes into ‘detects’ (detect/alert information) and index data. The Ingest node then forwards that data to a master node, which stores raw data and provides indexed information for analyst web tools. More complicated analytics are executed by the Analysis Node, which again places results back on the Master Node for the web interface to display. The Web Interface contains an HTTP web server with web analytics and web links for execution of additional analysis tools. The Human Analyst then examines alerting information that resides in the system using various analytical tools.
Figure 3: Generic Intrusion Detection Architecture
In our testbed implementation, IDS alerting by the Sensor Node is generated from anomalies on the process layer by monitoring critical process values. As mentioned, critical process variables are those that have been collaboratively defined to signify whether the control system is successfully operational or not. Sensor nodes are modified specifically to monitor the value of all critical process variables. For example, nominal values, and upper and lower limits for critical values, and criticality of the alert are programmed into the sensor node. This process-oriented intrusion detection method is meant to be used in parallel with anomaly-based and signature-based intrusion detection methods that are available for CPSs (see Colbert & Hutchinson 2016).