This algorithm was meant to be bulletproof, but we found at least 2 ways to bypass it: study case – how to attack the P2DPI algorithm
Intrusion detection systems (IDS) and intrusion prevention systems (IPS) are widely implemented on modern systems to provide visibility on network attacks and possible threats. However, a tradeoff on this approach is user privacy. Recent research proposes deep-packet inspection (DPI) algorithms and technologies while preserving both user privacy and also rule secrecy.
In the following article, we will present the fine details of a new algorithm that implements privacy preserving DPI. Afterwards, we will challenge the cryptographic constructs present in the whitepaper of the algorithm. In the end, you will have solid proof that nothing is indestructible: we have found and analyzed not one, but two attacks to the algorithm that directly affect user privacy. Of course, we’re proud to have a member of our team be the first to do so and also publish his findings in one of the most distinguished cyber security journals in Romania – Romanian Cyber Security Journal, National Institute for Research & Development in Informatics – ICI Bucharest.
P2DPI: what it is and how it came to be
Privacy-preserving DPI proposes deep packet inspection while still maintaining both user privacy and rule confidentiality by leveraging cryptographic constructs that allow searching inside of encrypted packets.
One of the recently proposed and improved algorithms is Practical and Privacy-Preserving Deep Packet Inspection, also known as P2DPI (Kim et al al, 2021). The authors proposed an algorithm that would fix certain pitfalls regarding a similar algorithm, PrivDPI (Ning et al., 2019), identified in the same publication as being vulnerable to exhaustive message search.
They said the P2DPI algorithm was bulletproof.
But, truth be told, nothing in cybersecurity can be ruled out as completely bulletproof. We found that the controls proposed by the article can be bypassed to obtain information about the users traffic, thus breaking the promise of confidentiality.
The following analysis is based on the scientific and comprehensive article Cryptographic analysis of P2DPI (Bogatu, 2022), which is the first to prove that attacks on the P2DPI algorithm could be implemented in a realistic scenario. Formulas and certain paragraphs are adapted by the author from his initial work.
First, let’s analyze the architecture of IDS and IPS in as far as privacy-preserving deep packet inspection is concerned
The Rule Generator (RG) is the system responsible for generating IDS/IPS rules. The generated rules must be kept secret from other parties. The RG should not be able to decrypt traffic or otherwise guess plain-texts from S/R.
MiddleBox (MB) is the actual system that must implement the IDS/IPS detection phase. It works as a man-in-the-middle between the Sender and Receiver. It obtains rules from RG that will be converted to session rules by S/R then MB will compare them against obfuscated traffic tokens from the hosts. If a match is found, MB should take actions or alert other entities. MB should not be able to see the plain-text traffic (or otherwise decrypt the traffic) from S to R. The Middlebox also should not be able to create its own rules.
The Sender (S) and Receiver (R) are parties interested in communicating with each-other over a private channel. S/R should not be able to see the rules in a readable version, as it might help malicious entities bypass through obfuscation. The RG entity may also want to preserve its intellectual property, hiding the rules from S and R.
Learn how the P2DPI algorithm works
To achieve sharing of obfuscated data, the algorithm uses the concept of key-homomorphic encryption, that is encryption able to perform modifications of encrypted data without decrypting it.
The first step of the algorithm is the setup phase.
First the Rule Generator will choose two parameters, g and h that are elements within a cyclic group and share them alongside a key sent through a secure channel to the MiddleBox.
For each rule, RG will then obfuscate it using the following formula: . RG will then send the obfuscated rules with their signature.
The following notations apply: H1 is a random oracle function and ri represents the plain-text rule.
After the initialization on the MiddleBox, if 2 hosts communicate with each other through the MB, they will receive the obfuscated rules alongside their signature, signed by RG. The hosts must get the public certificate of RG to verify that the data is valid and comes from RG. If the validation passes, both hosts will then send to MB the obfuscated session rules.
The rules and the traffic must be broken down in tokens, for the algorithm to work. One of the proposed methods in P2DPI (Kim et al, 2021) is a sliding-window. The tokens will be assigned as in the formula below:
Each traffic token will be obfuscated by the sender based on a randomly chosen counter with a random oracle. The middlebox in turn will take each session rule and apply the same random oracle with each counter. If a match is found then malicious traffic is detected.
The algorithm failed to mention signatures from S, R and MB, resulting in possible integrity issues on the communication line. This may allow an attacker positioned between each side of MB to modify the packets so that the algorithm would not be able to detect malicious traffic.
With this in mind…
…what vulnerabilities does the P2DPI algorithm have?
When it comes to P2DPI, the vulnerabilities affecting this algorithm are mainly related to the Rule Generator.
Setup attack
The authors of P2DPI consider that the rule generator might become a threat to user confidentiality, and they proved that PrivDPI is vulnerable to exhaustive message search in the case of a malicious RG that also captures traffic.
This article will prove that P2DPI is also vulnerable to the same type of attack.
Considering that RG choses g and h, by having access to a man-in-the-middle or control over the middlebox, it can generate any obfuscated traffic token.
If we note f, the generator of the cyclic group of prime order , the values g and h have the property such that where alpha and beta are integers.
With this consideration, the obfuscated rules become a function that could be broken down as follows:
After installing an arbitrary rule and finishing the setup phase, a malicious RG can compute the generator function encrypted with the key of S and R by utilizing the key-homomorphic properties.
Having this equation, a malicious actor can then compute a function that generates obfuscated traffic tokens with the key of S and R without actually knowing the key:
Having access to a token or guessing the plaintext, an attacker can then compromise the entire communication session by brute-forcing all possible values of the unknown byte in the adject token.
Malicious rules attack
An additional point of failure was found in the algorithm. If an attacker could generate a basis in the session rules signed by S/R, it can also compromise the algorithm, without knowing the values that generated g and h.
This attack requires more interactions from the client, as they must encrypt the malicious rules with its key. However, this is a trivial task as the client can not check what a rule contains, but only can decide how many rules it wants to accept. A malicious RG in collaboration with MB could generate as little as 2 rules to compromise the security of the algorithm as follows:
MB requests the signing of the 2 following rules (provided they are signed by RG):
S/R will reply with the following obfuscated rules:
MB can then compute the session rules that will be used as a basis to generate obfuscated tokens with S/R key:
By using group operations we can write
As such, MB can compute any obfuscated token, gaining access to exhaustive search vulnerability to all captured obfuscated tokens encrypted with the key of S/R.
Implementation
As stated in (Bogatu, 2022), the vulnerability was implemented in a scenario when the rule generator did not have direct access to the parameters responsible for the cyclic group base for the algorithm. The implementation used Elliptic Curve Cryptography (ECC) for cyclic groups and Python as the programming language. Instead of using AES for the random oracle, the proof used Blake3 hashing function with a salt. The architecture is operating on a client-server design. Tests were performed on an Intel Core i5-4590 @ 4x 3.7GHz. By knowing four adjacent plain-text bytes from traffic, the attack recovered 411 bytes from the original message in under 5 minutes, using a parallel brute-force technique.
The implementation of the server was also used in a 54 hours long Capture the Flag competition as part of Bit Sentinel’s contribution to DefCamp CTF 21-22, one of the oldest and largest cybersecurity CTF competitions in Central and Eastern Europe. Out of 307 teams with positive scores, 8 teams solved the challenge, out of which 7 finished in the top 10 teams.
Conclusion
Although the recently mentioned algorithms achieved great performance, currently, only Blindbox (Justine, et al. 2015) seems to be the only one that preserves user privacy while still allowing a secure deep packet inspection architecture. At Bit Sentinel, we believe that a healthy mindset for cybersecurity practices is based on the fundamental idea that nothing is ever bulletproof or 100% secure. With this in mind, we should always be prepared to improve our tools, systems or tactics – and this goes for cybersecurity practitioners as well.
About the Author
Mihai-Alexandru Bogatu (Ephvuln)
Passionate about operational cyber security but also about recent research on the domain, Alex currently works as a Penetration Tester at Bit Sentinel.
In the past, Alex has been a player in the UNbreakable Romania competition and later became one of the challenge authors for UNbreakable Romania 2021 & 2022. He also proposed challenges for DefCamp Capture the Flag 2021.