The foundational level of situation awareness lies in perception of the surrounding environment. In cyberspace, this relates to an ability to enumerate and identify elements of the cyber terrain, particularly, network-connected devices that are employed to accomplish a user’s goals. These devices emit a plethora of signals in network traffic and server logs as they negotiate for services, but they do not share consistent features in those signals that make it straightforward to uniquely identify which hosts are active over a period of interest.
This podcast presents a cyber Entity Resolution approach and blocking technique designed to bridge this gap. The technique is based on the construction and comparison of periodic snapshots of collections of “host segments,” that are built up from multiple log files.
Watch the podcast to see results from an open dataset in a cyber situation awareness prototype, BitBook, that visualizes the host segments generated from parsed network traffic over an observation window. Bitbook’s efficacy at discovering hosts in that experiment is briefly discussed.
This podcast also describes a supervised machine learning approach to assist in Entity Resolution of hosts between two snapshot collections of host segments. The authors of the associated report performed an experiment using server logs from a larger enterprise network to resolve hosts between snapshots taken 15 minutes apart and one day apart; in the worst case, only 4 false positives and 24 false negatives were incorrectly associated among nearly 12K correctly labeled hosts.