Here, we have 3 given parameters:
-
en_embeddings: English words and their corresponding embeddings.

-
fr_embeddings: French words and their corresponding embeddings.

Now, we have to create an English embedding matrix and French embedding matrix:

Given dictionaries of English and French word embeddings we will create a transformation matrix R. In other words, given an english word embedding, e, we need to multiply e with R, i.e., (eR) to generate a new word embedding f.
We can describe our translation problem as finding a matrix R that minimizes the following equation:

For this, we calculate the loss by modifying the original Forbenius norm :
Original Forbenius Norm: 
Modified Forbenius Norm: 
Finally, our loss funtion will look something like this:


Now, in order to optimize the value of R (from a random squared matrix to an optimal value), we need to perform calculate the gradient of the loss we calculated in Step 2, w.r.t the transform matrix R. The formula for calculating the gradient of the Loss Function, L(X, Y, R) is:

Gradient descent is an iterative algorithm which is used in searching for the optimum of the function. Here, we calculate gradient g of the loss with respect to the matrix, R (Step 3). Next, we update the original R with the formula:

where, alpha is the learning rate (scalar quantity).
k-NN (K-Nearest Neighbours) is a method which takes a vector as input and finds the other vectors in the dataset that are closest to it. The 'k' is the number of "nearest neighbors" to find (e.g. k=2 finds the closest two neighbors).
Since we're approximating the translation function from English to French embeddings by a linear transformation matrix R, most of the time we won't get the exact embedding of a French word when we transform embedding e of some particular English word into the French embedding space.
By using 1-NN with eR as input, we can search for an embedding f (as a row) in the matrix Y which is the closest to the transformed vector eR. In order to find the the similarity between two vectors u and v, we calculate the cosine of the angle between them using the formula:

Finally, we calculate the accuracy of our translation model using the following formula:

Using the formula above, our model achieved an accuracy of 55.7% on unseen data. This transaltion was achieved by using some basic linear algebra and learning a mapping of words from one language to another!
You can find the entire code (Python) for Machine Translation (English-to-French) over here.
