# Threat Modeling using Neo4j Graph Database

Posted by Riino Site on August 8, 2022 neo4j threat-modeling cyber-security

# Threat Modeling using Neo4j Graph Database[Draft]

THIS VERSION IS FOR RIINOSITE USAGE

Author : Riino Shiqi Zhang, Cyber White Hat Team

## Abstract

In this guidance, we will introduce a threat modeling technique that was used in a real project in 2021, as a part of assessment result for an experimental L4 autopliot system from an automobile maker , where we converted the real world network tropology into digital twin in graph database with given principles , quantified by CVSS 3.1 to formalize a Threat Modeling Graph(TMG). Based on this graph, a conditional traversal that search every possible attack chain can be made to try to simulate , assess and quantify risk under STRIDE model. There are some benefits that we would like to share to explain the reason why we use graph database and why we combine threat modeling and graph technique together:

• Time-saving : As the final output, the traversal algorithm generates over 120k possible attack chains, with different type of STRIDE, calculated multi-step attack CVSS scores. Therefore we can quickly confirm the most risky scenarios via descending sorting, which saves lots of time.
• Operation-saving: As for the input, our team can only maintain the list of entry-point, devices and servers. During the analysis, we can extract every relationships generated out of the database and generate a excel file to let us give CVSS scores for each relationships, with some rules that we designed.
• Visualization: Graph structure provides us huge advantages that we can visually check the entire database to see if there is any mistake, and we can easily maintain a complex network topology and share it with others.
• Agile: The entire system can be updated very quickly by just updating all input excel files and run the scripts again, in another word, by simply updating few rows, we can update entire output with over 120k records.

Additionally, the core motivation of using graph is that we can create a digital twin of the entire environment, and use graph as a powerful media from defenders’ perspective and gain advantages on the coverage, efficiency, and intuition during threat modelling procedure.

## Preface

Kindly keep it in mind that due to NDAs and other policies in the company, this will not disclose any confidential information, personal information, or and information that can disclose the details of the project and the client , including the name of client or the basic information about the system.

The main intention of writing is to present a brief guidance on how to implement a simple TMG and use it for threat modelling. In this guidance, we will use Python and neo4j Cypher language for demo scripts. However, most concepts will be presented as mathematical formulas and diagrams. Therefore you can choose any other alternatives to implement same concepts, including TMG.

## 1. Introduction

### 1.1 Basic Concept of TMG

The threat modeling procedure works from defenders’ perspective. The main reason of using graph is that we would like to demonstrate a visualized and intuitive overall status of a system, which is a network tropology, in most cases.

The TMG shares same idea as to use graph database to store components involved in the scope of analysis to clearly shows what components can be attacked. Here is a Knowledge Graph from neo4j’s:

Fig. 1: A sample cybersecurity knowledge graph from neo4j

Compared with knowledge graph, the TMG contains less in demonstration the real access network but more isolated potential entry points as individual nodes in the graph. Besides, relationships (i.e. the connection between nodes) represent CVSS rating in certain attack direction via property feature. Based on the type of nodes linked, there could be bidirectional or unidirectional attack(s). We will explain this in section 2.4.

Fig. 2: The basic concept of TMG with a simple C/S structure

### 1.2 Node Types

You may notice that there are three types of nodes in a TMG with different color, they are interface , component and asset:

• interface : represents an entry-point that a attacker can utilize, and try to attack the component it refers to, like a USB port, a LAN port, or an available terminal that attacker can type commend in. In the demo above, the green nodes are the interface node.
• component: represents a device(can be physical or virtual) that can contain assets, like a computer, cloud VM, a edge-computing board, or any device that can be hacked (i.e. partially controlled by attackers)
• asset: could be digital asset like running application with data, running database, or physical asset, like hard drive, etc.

Using Cypher language, the structure of these nodes can be represented as:

(:Component)-[:HAS_ASSET]->(:Asset)
(:Component)-[:HAS_INTERFACE]->(:Interface)


And the corresponding graph is :

flowchart LR
c1([:Component]) --> r1[:HAS_ASSET]-->i([:Asset])
c2([:Component]) --> t[:HAS_INTERFACE]-->2([:Interface])


Besides, you may notice that the network node above contains a purple outline, which means it is also a communication component, it could also have an Asset, depends on if you desire to protect this communication channel and regard itself as a part of your asset.

A communication node must be a component, we can use multi label feature in neo4j to easily represent this:

(:Component:Communication)-[:HAS_ASSET]->(:Asset)

flowchart LR
c1([:Component:Communication]) --> r1[:HAS_ASSET]-->i([:Asset])


Moreover, if the system is complex, we recommend to use category concept by using facility nodes:

(:Facility)-[:HAS_COMPONENT]->(:Component)-->[:HAS_Asset]->(:Asset)

flowchart LR
f([:Facility]) --> r2[:HAS_COMPONENT]--> c1([:Component])-->r1[:HAS_ASSET]-->i([:Asset])


Therefore you can easily query all components that belongs to certain facilities, here is a demo shows the interface result, grouped by facility, which we will use later to connect interfaces.

MATCH (f:Facility)
WITH f
MATCH (f)-[:HAS_COMPONENT]->(:Component)-[:HAS_INTERFACE]->(i:Interface)
RETURN i as interface ,f as facility

Script 1 : Query all interface nodes and facility nodes

About how to generate these nodes, please refer to section 2.2 and section 2.3.

### 1.3 Attack Chain

The final output will be a connected graph and every path from given two nodes in this graph is a sequence with alternate interface nodes and component nodes. Therefore the node list from any attack chain will be : $$S=\{i_1,r_1,c_1,i_2,r_2,c_2,...,i_n,r_n,c_n\ |\ i_i\in Interface,c_i\in Component,r_i\in HAS\_INTERFACE \} \tag 1$$

Eq. 1 Definition of TMG Attack Chain

The equation above will be used to calculate the final risk rating for a given path, we will expand this concept in section 3.1

## 2. Graph Initiation

### 2.1 Overview

In this section, we will introduce how to build a TMG from scratch, there are 3 major steps where different input is required:

1. Create Component-Asset Trees

In this step, a list of asset is the input, and the goal is to create a tree structure demonstrating the relationships of facilities, components and assets.

2. Create Interface Nodes

In this step, we need a table of interface mapping to create all interface nodes, and if the input is correct, we will get a connected graph (i.e. there is a path from any point to any other point in the graph, wolfram)

3. CVSS Rating

After the topology structure is complete, we need to give a CVSS rating in every possible attack with a given direction, with some specific rules we defined. This will help us calculate the shingle-step & multi-step attack rating eventually.

After initiation of TMG, we can easily use traversal and filter technique with selected STRIDE pattern to get threat modelling result, we will introduce traversal and filter methods in next section.

### 2.2 Create Component-Asset Trees

Based on the C/S model above, from the defender’s view , the possible assets could be presented from this table:

Facility Component Asset Rating: Security Rating: Operation Rating: Tech Rating: Compliance
Data Center Server DB
Data Center Server Backend
Client Client Computer Frontend
Table 1 : Input - Asset List

Take all facilities as the roots of tree we would like to create later, the final output of this step will be a collection of trees:

Fig. 3: Three-layer forest structure from asset list input

To accomplish this, we need to read the table by line, and save the data of each line into a mapping as asset_prop, and call the function below:

def update_asset_into_db(session,asset_prop):
def _cypher(tx, asset_prop):
return list(tx.run(
'''
MERGE (facility:Facility {name: $facility_name }) MERGE (component:Component {name:$component_name })
MERGE (asset:Asset {name: $asset_name, asset_id :$id, security : $security ,operation :$operation, tech : $tech , compliance :$compliance, des : $des}) MERGE (facility)-[has_component:HAS_COMPONENT]->(component) MERGE (component)-[has_asset:HAS_ASSET]->(asset) RETURN has_component,has_asset ''', {'facility_name': asset_prop['facility_name'] , 'component_name': asset_prop['component_name'], 'asset_name': asset_prop['asset_name'], 'id': asset_prop['id'], 'security': asset_prop['security'], 'operation': asset_prop['operation'], 'tech':asset_prop['tech'], 'compliance':asset_prop['compliance'], 'des':asset_prop['description'] } )) result = session.write_transaction(_cypher,asset_prop) return result  Script 2 : Build Faclility-Component-Asset Tree If you are not familiar with python and cypher, the script above contains these main steps: 1. Create/Update(if it already exists) a facility node, with a given name. 2. Create/Update(if it already exists) a component node, with a given name. 3. Create/Update(if it already exists) an assert node, with a given name and properties. 4. Create an unidirectional relationship between the facility node and the component node. 5. Create a unidirectional relationship between the component node and the asset node. Notes: • Make sure your input is correct and compatible with the script. • It’s important to regard communication facilities as your component and identify corresponding assets. ### 2.3 Create Interface Nodes In this step, we will create corresponding interface nodes, and set correct relationships HAS_INTERFACE, once this step is done, you should get a Connected Graph, otherwise, it means your system can be divided into smaller systems and can be analyzed separately. Before doing that, let’s hide facility nodes and asset nodes for now, and focusing on component nodes. To accomplish that, there are two parts during this step: 1. Create interface nodes for each existed component nodes. 2. Create virtual component nodes to connect two interface nodes depending on situation in real world. The Basic idea is to create such relationships in graph database: (:Component)-[:HAS_INTERFACE]->(:Interface)  flowchart LR c1([:Component]) --> r1[:HAS_INTERFACE]-->i([:Interface])  or, in the case of virtual component, an extra label is used: (:Virtual:Component)-[:HAS_INTERFACE]->(:Interface)  flowchart LR c1([:Virtual:Component]) --> r1[:HAS_INTERFACE]-->i([:Interface])  Firstly, we need to prepare the mapping table as below, the component column can be generated by query all component nodes in database: MATCH (n:Component) RETURN n  Script 3 : Query all Component Nodes And we have to enter the corresponding interface column, to let the script know what interface to create. In neo4j, though the name property of Component nodes is same, the neo4j will still create nodes with different UIDs. So keep it in mind that you have to design the principle of naming interface nodes and the way you use Cypher directives.(MATCH/MERGE) Component Interface network:Communication Gateway1,Gateway2,Internet Server Gateway1 Router Gateway2,LAN1 Client Computer Terminal,LAN2 Table 2 : Input - Interface Mapping (1/2) You may notice that we would like to connect two interface ‘LAN1’ and ‘LAN2’ here since they are connected physically in real world. In this case we have to create a virtual component with 2 labels, “Component” and “Virtual”. The latter one is used for skip traversal later since it is not a real component with assets. Therefore, we have to add extra information in the table above: Component Interface Virtual-Router-Client-Computer LAN1,LAN2 Table 3 : Input - Interface Mapping (2/2) After completing Component-Interface mapping table, we can read this mapping and create interface node or virtual-component nodes by reading this table by line, and for each component and interface, here is a sample of python script showing how to create corresponding nodes and relationships. def create_interface(session,component_name,interface_name): def _cypher_virtual_component(tx,component_name,interface_name): return list(tx.run( ''' MERGE (i:Interface {name :$interface_name})
WITH i
MERGE (c:Component:Virtual {name : $component_name }) WITH c,i MERGE (c)-[r:HAS_INTERFACE]->(i) RETURN r ''', {'interface_name': interface_name , 'component_name': component_name } )) def _cypher_create_interface(tx,component_name,interface_name): return list(tx.run( ''' MERGE (i:Interface {name :$interface_name})
WITH i