Prokura Innovations

Facial-Recognization What and How

AI, the buzzword of the 21st century, a symbol of technological advancement and the fear of most working people has been quietly integrating itself in our human lives. Whether it may be at work or in our day to day lives, it has become a part of us constantly evolving itself to be better. One such AI that has been growing with us is Facial Recognition Technology.

Remember, when you used to upload a picture of you and your family and friends on Facebook, it used to tell us to tag your friends by locating them in your picture. We thought what a great way it was, that we could tell Facebook where our friends’ faces were in the picture. We used to be really amazed by that feature. What we really didn’t know was that Facebook was using its massive number of users to train the Facial Recognition System. In today’s time, Facebook can automatically tag people in the picture in a second. See the accuracy and efficiency of the system have grown without we really realizing it.

The whole Facial Recognition System can be complex, but it is only about measuring the similarity between two faces. The question of how accurately and efficiently differs. Most images of human faces have two eyes, a nose, lips, forehead, chin, ears, hair… That rarely changes. Yet, faces are different from each other. What makes them different? At the same time, the face of the same person changes with emotion, expression, age… In fact, just a change in orientation creates a different image. How do we identify a person in spite of all that?

Today we will dive into the basic elements of the system. Facial Recognition System consists of a pipeline of a series of sequential tasks whose final accuracy depends upon the accuracy and efficiency of each individual task. In simple words, we need to locate and extract the faces from a picture, then identify the facial features like nose and eyes, align faces to match the pose template, do face encoding and then finally compare it to other faces in the database. I would like to describe each one of them in the following paragraphs.

  1. Face Detection: A picture may contain different objects with people being one of them. We need to locate where these people, especially their faces are in the picture. The idea of locating where the face is in the picture is more cumbersome than to simply detect whether the picture has a person or not. For localizing these faces, we use a detector called Sliding Window Classifier. This detector can be implemented in a variety of ways using techniques like Viola-Jones, Histogram Oriented Gradients, and Convolutional Neural Network.

a.   Viola Jones: This technique is a classical image processing technique that uses decision trees to determine the location of the face based on the light and dark areas of the face. It is very fast and mostly useful for low-end devices. However, it is not very accurate giving a lot of false positives.

b.   Histogram Oriented Gradient (HOG): This technique looks for a shift from light to dark areas in an image. First, the image is turned to grayscale where we can see the different shades of grey as light and dark. Then it looks for the transition from light to dark areas depending upon the intensity. After that, it only takes the number of gradients(difference in intensity of a pixel with the neighboring pixel) in a specific direction. Lastly, the HOG faces are trained in a face classifier. This technique is simple but retains information and is not affected by small changes in lighting or shape giving an acceptable accuracy even on small training sets.

c.   Convolutional Neural Network: It is a deep learning technique that requires expensive computing. While it may be the most accurate, it needs a lot of training data.

  1. Face Landmarks Detection: Face Landmarks are special points of interest in the face like eyes, nose, lips, etc. There are 68 landmarks used for a facial recognition system normally. A simple trick for a face landmark detector is assuming all faces are similar like all faces have eyes, nose, lips, forehead, cheek, etc and assuming the landmarks points have a limited movement away from its neighboring points. The detection of these facial landmarks can be done using an open-source pre-trained deep learning model.
  1. Face Alignment: People can be posing in photos in different ways with their heads turned to different directions. Face Alignment is done so that all facial features align in a predetermined alignment. It helps the Facial Recognition System to work even if people don’t face the camera directly. Face Alignment consists of head angle correction and rotation that will make our system work more accurately. Face alignment can be achieved by doing affine transformation like movement, rotation, and stretching of face landmarks not by twisting or warping.
  1. Face Encoding: It is the process of taking an image of a face and turning it into a set of measurements. Altogether 128 measurements are generated for each face. This is done using a face encoding model inspired by deep metric learning. These measurements are saved into a database with different measurements unique to a face.
  1. Face Recognition: This is the final part where the magic happens. A new image of a person goes through the Facial Recognition pipeline generating the measurements through face encoding. The measurements of a unique face are fetched from the database and the euclidean distance is calculated between these measurements. A Euclidean distance is a distance between two points in space. A Euclidean distance lesser than 0.6 tells the face has matched, so we can now recognize the identity of the person.

While the Facial Recognition System can be a boon to some, it can be a curse as well. Facial Recognition Systems have come under much scrutiny these days due to its use in surveillance of people. This can undermine the privacy of people, so governments and institutions are wary of using this technology. We can hope one day this system will be used in a regulated manner to safeguard people’s rights and at the same time contributing to the cause of humanity.

Recent Posts


GIT and It’s workflow

Git is a Distributed Version Control tool that is used to store different versions of a file in a remote or local repository. Git is a distributed version control system

Read More »

Description of Author

  • Swain Shrestha, whose fantasy was to build humanoids and rockets, when he was young, hails from Dharan, a beautiful city in the Eastern part of Nepal. Currently, he is the Drone Project Manager at Prokura Innovations and Co-Founder of Atharva Technology. He is a graduate in Electronics and Communications Engineering from Institute of Engineering, Pulchowk Campus. Working on new ideas and innovations that can have a social impact in the world gives him the adrenaline rush to work more. He likes researching and innovating in the field of robotics, drones, cloud computing and artificial intelligence.

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *