Train your first Siamese Neural Network for detecting Face Similarity

Computer Vision Aug 30, 2021

In this blog, I have been covering sequential Neural Networks for the tutorials. With my studies, I have started to learn some advanced neural network architectures. So, in this blog we are going to try Siamese Networks. Specifically, we will be implementing a Facial Similarity system using Siamese networks and one-shot learning.

A close up macro shot of two girls with blue eyes. These girls are my two eldest daughters. I shot this using available natural light from a south facing window.
Photo by Sharon McCutcheon / Unsplash

The term Siamese Networks originally comes from the conjoined twin brothers Chang and Eng Bunker(May 11, 1811 — January 17, 1874), who were the first pair to be known internationally. The term is used for those twins who are physically connected to each other at the chest, or at the abdomen or the pelvis. The Neural Network we are going to see in this tutorial also consists of a pair of Networks which are actually the same, hence the name derives from the Siamese Twins.

Siamese networks are a special type of neural network architecture where instead of a model learning to classify its inputs, the neural networks learns to differentiate between two inputs. It learns the similarity between them.

What is One-Shot learning?

It is an object categorisation problem, found mostly in computer vision. Whereas most machine learning based object categorisation algorithms require training on hundreds or thousands of samples/images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training samples/images.

Simply put, a pair of two neural networks get one image each and using a contrastive loss function, we train the network to put out a similarity score between the two images. Both the networks have shared weights to output a similarity score.

We will be using the ORL face database which contains a set of faces taken between April 1992 and April 1994 at the Olivetti Research Laboratory in Cambridge, UK. We will be training the network on Kaggle using GPU acceleration.

Some sample images from the dataset

Let's get into the tutorial!

Step 1: Import & Preprocessing the dataset

class Config():
    training_dir = "../input/orl-data-split/ORL/training"
    testing_dir = "../input/orl-data-split/ORL/testing"
    train_batch_size = 64
    train_number_epochs = 100
class SiameseNetworkDataset(Dataset):
    def __init__(self,imageFolderDataset,transform=None,should_invert=True):
        self.imageFolderDataset = imageFolderDataset    
        self.transform = transform
        self.should_invert = should_invert
    def __getitem__(self,index):
        img0_tuple = random.choice(self.imageFolderDataset.imgs)
        #we need to make sure approx 50% of images are in the same class
        should_get_same_class = random.randint(0,1) 
        if should_get_same_class:
            while True:
                #keep looping till the same class image is found
                img1_tuple = random.choice(self.imageFolderDataset.imgs) 
                if img0_tuple[1]==img1_tuple[1]:
            while True:
                #keep looping till a different class image is found
                img1_tuple = random.choice(self.imageFolderDataset.imgs) 
                if img0_tuple[1] !=img1_tuple[1]:

        img0 =[0])
        img1 =[0])
        img0 = img0.convert("L")
        img1 = img1.convert("L")
        if self.should_invert:
            img0 = PIL.ImageOps.invert(img0)
            img1 = PIL.ImageOps.invert(img1)

        if self.transform is not None:
            img0 = self.transform(img0)
            img1 = self.transform(img1)
        return img0, img1 , torch.from_numpy(np.array([int(img1_tuple[1]!=img0_tuple[1])],dtype=np.float32))
    def __len__(self):
        return len(self.imageFolderDataset.imgs)
 folder_dataset = dset.ImageFolder(root=Config.training_dir)
siamese_dataset = SiameseNetworkDataset(imageFolderDataset=folder_dataset, transform=transforms.Compose([transforms.Resize((100,100)),transforms.ToTensor()]),should_invert=False)

vis_dataloader = DataLoader(siamese_dataset,
dataiter = iter(vis_dataloader)

example_batch = next(dataiter)
concatenated =[0],example_batch[1]),0)

Step 2: Helper Functions

def imshow(img,text=None,should_save=False):
    npimg = img.numpy()
    if text:
        plt.text(75, 8, text, style='italic',fontweight='bold',
            bbox={'facecolor':'white', 'alpha':0.8, 'pad':10})
    plt.imshow(np.transpose(npimg, (1, 2, 0)))    

def show_plot(iteration,loss):

Step 3: The Model

class SiameseNetwork(nn.Module):
    def __init__(self):
        super(SiameseNetwork, self).__init__()
        self.cnn1 = nn.Sequential(
            nn.Conv2d(1, 4, kernel_size=3),
            nn.Conv2d(4, 8, kernel_size=3),

            nn.Conv2d(8, 8, kernel_size=3),


        self.fc1 = nn.Sequential(
            nn.Linear(8*100*100, 500),

            nn.Linear(500, 500),

            nn.Linear(500, 5))

    def forward_once(self, x):
        output = self.cnn1(x)
        output = output.view(output.size()[0], -1)
        output = self.fc1(output)
        return output

    def forward(self, input1, input2):
        output1 = self.forward_once(input1)
        output2 = self.forward_once(input2)
        return output1, output2

Step 4: Loss Function

class ContrastiveLoss(torch.nn.Module):
    Contrastive loss function.
    Based on:

    def __init__(self, margin=2.0):
        super(ContrastiveLoss, self).__init__()
        self.margin = margin

    def forward(self, output1, output2, label):
        euclidean_distance = F.pairwise_distance(output1, output2, keepdim = True)
        loss_contrastive = torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
                                      (label) * torch.pow(torch.clamp(self.margin - euclidean_distance, min=0.0), 2))

        return loss_contrastive

Step 5: Training

train_dataloader = DataLoader(siamese_dataset,
net = SiameseNetwork().cuda()
criterion = ContrastiveLoss()
optimizer = optim.Adam(net.parameters(),lr = 0.0005 )

counter = []
loss_history = [] 
iteration_number= 0

for epoch in range(0,Config.train_number_epochs):
    for i, data in enumerate(train_dataloader,0):
        img0, img1 , label = data
        img0, img1 , label = img0.cuda(), img1.cuda() , label.cuda()
        output1,output2 = net(img0,img1)
        loss_contrastive = criterion(output1,output2,label)
        if i %10 == 0 :
            print("Epoch number {}\n Current loss {}\n".format(epoch,loss_contrastive.item()))
            iteration_number +=10

Step 6: Evaluation

folder_dataset_test = dset.ImageFolder(root=Config.testing_dir)
siamese_dataset = SiameseNetworkDataset(imageFolderDataset=folder_dataset_test,

test_dataloader = DataLoader(siamese_dataset,num_workers=6,batch_size=1,shuffle=True)
dataiter = iter(test_dataloader)
x0,_,_ = next(dataiter)

for i in range(10):
    _,x1,label2 = next(dataiter)
    concatenated =,x1),0)
    output1,output2 = net(Variable(x0).cuda(),Variable(x1).cuda())
    euclidean_distance = F.pairwise_distance(output1, output2)
    imshow(torchvision.utils.make_grid(concatenated),'Dissimilarity: {:.2f}'.format(euclidean_distance.item()))
Results from the trained model


Find the public notebook with the full implementation below:


This tutorial covers face similarity detection, find out one on Facial Recognition below:

Face Recognition using Siamese Networks
A Facial Recognition System is a technology that can capture a human face anywhere in an image or a video and also can find out its identity. A Face Recognition system has proven to be very…