Solving LPN Problem with Machine Learning: An Introduction
Ever since the series finale of HBO's Silicon Valley premiered, a lot of people (including me) started asking the question:
"Is it actually possible for an AI to break an encrypted message?"
There is a tremendous amount of research going on in this domain and there has been some luck in breaking AES Encryption. As an introduction to the above domain, we will take a look at the Learning Parity with Noise or LPN Problem to help us understand how an AI can break encryption?
Basics of Encryption and Encryption Keys:

So, What is the LPN Problem?
To understand the LPN Problem, we must first understand what is Parity Learning? (Parity - A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are used as the simplest form of error detecting code).
Parity learning is a problem in machine learning in which an algorithm that solves this problem must guess the function ƒ, given some samples (x, ƒ(x)) such that ƒ computes the parity of bits at some fixed locations. The samples are generated using some distribution over the input. The problem is easy to solve using Gaussian elimination provided that a sufficient number of samples (from a distribution which is not too skewed) are provided to the algorithm which can help in cracking the secret key.
$$As = b$$ where A are the samples, s is the secret key and b is the response.
Now, in usage, the above is averted by adding "Noise" which makes things interesting as the problem now transforms to:
$$As \approx b$$
This is where LPN problem comes in. Instead of sending As as it is, it adds a noise e with some probability p (p<0.5). Simply put, it flips the response bit from 0 to 1 OR from 1 to 0 with probability p and with probability 1-p it sends the response without any noise. The systems making use of above are a lot more immune to the Man in the Middle attack. Since, the above computations are simple and so hard to crack such that LPN is believed to be resistant to quantum computers. This makes it a good candidate to the number-theoretic problems (e.g. factorization and discrete logarithm) which can be solved easily with quantum algorithms. Also, due to its simplicity, it is a nice candidate for lightweight devices. (Door Locks, Hotel Room Keys etc.)
Now, let's have a look at a code snippet that generates samples according to a random secret key: (We will be using Scikit Learn's DecisionTreeClassifier, RandomForestClassifier, ExtraTreesClassifier for the scope of this tutorial)
class LPN:
def __init__(self, secret_key, error_rate):
self.secret_key = secret_key
self.dimension = len(secret_key)
self.error_rate = error_rate
def sample(self, n_amount):
# Create random matrix.
A = np.random.randint(0, 2, size=(n_amount, self.dimension))
# Add Bernoulli errors.
e = np.random.binomial(1, self.error_rate, n_amount)
# Compute the labels.
b = np.mod(A @ self.secret_key + e, 2)
return A, b
Let's generate some examples: (We will be taking a very small probability for p for the scope of this tutorial)
p = 0.125
dim = 16
s_original = np.random.randint(0, 2, dim)
lpn = LPN(s_original, p)
A, b = lpn.sample(100000)
We will now fit these sample on a ExtraTreesClassifier:
from sklearn.ensemble import ExtraTreesClassifier
et = ExtraTreesClassifier()
et.fit(A, b)
s_candidate_et = et.predict(np.eye(dim))
et_ef = np.mod(A @ s_candidate_et + b, 2).sum()
print(et_ef)
if et_ef < 14000:
print(s_candidate_et, s_original)
else:
print('Wrong candidate. Try again!')
The calculation for a Valid Candidate is done as follows:
The learning algorithm might fail to capture the ground truth and learn another function. In this case, the Hamming weight of the vector is quite high (around 50000 for our vector of length 100000). This corresponds to the case which has the wrong key and can answer about half of the challenges correctly. If s_candidate = s_original, the Hamming weight will be around 0.15 * 100000 = 15000. Having a threshold of 14000 in this example is a good tradeoff between recognizing the correct secret and not outputting a wrong candidate as the solution.
[Extra]
If we tweak the dimensions, up the samples by 10x and hope that most the random generated samples are the original messages (the reason why we kept the probability so small: 0.125), sometimes we can actually find out the secret and can be checked here:

Now, before anyone goes ahead sniffing hotel room key and lock's traffic to get enough examples, keep in mind those keys change on an interval, have higher value for p and most probably use variable dimensions for the secret.
[Bonus]
This blog is a boiled down version of an awesome blog by Dr. Robert Kübler:

Cheers!
P.S. - Happy New Year!