Introduction

This notebook is intended to be used as an introduction to OpenCV as well as other Python libraries that will be useful for navigating through this task. It is recommended to spend some time looking through our github site before working through this notebook.

Jupyter Notebooks

To set up a similar environment locally follow the steps on our GitHub site under the Tutorials section. This can be useful for downloading some of our notebooks as well as setting up a more interactive environment for working through the notebooks.

What is OpenCV

OpenCV is a popular open-source computer vision library originally developed in C/C++ with a stable Python release being introduced in 2009. Computer Vision includes the development of techniques to help computers analyze and understand the content contained in an image or video through the use of algorithms.

Imports

In any Python project we will first need to import relevant libraries that will be used. For the purpose of this example we are going to import three libraries: 1) cv2, 2) common, 3) numpy.

  1. cv2 is simply the notation for importing the openCV library that was dicussed above

  2. common - contains a few useful openCV functions

  3. numpy - often abbreviated as np for simplicity, this is a Python library which adds support for large, multi-dimensional arrays and matrices as well as a collection of high-level mathematical functions to operate on these arrays. In computer vision images and video are simply collections of pixels which can be represented using arrays so this library is very useful to simplify operations on these structures.

import cv2
import common
import numpy as np

# These imports are inteded to improve the notebook experience
%matplotlib inline
from matplotlib import pyplot as plt
import pylab
pylab.rcParams['figure.figsize'] = (10.0, 8.0)

With these imports we can now display a basic image using the openCV imread function.

cv2.imread('driver_face.jpg')

Notice how the output from this function is a multi-dimensional array encoding of pixel values contained in the image. We can use the below function to assign our image to a variable img. The image we are using in the example below comes directly from the dataset for this task.

NOTE - When we display images in this notebook we use plt.imshow but when working outside a jupyter notebook using opencv.imshow will open a window containing the image

img = cv2.imread('driver_face.jpg')
cv2.imshow('image', img)
plt.imshow(img, cmap='gray')

Notice how the colors in the above image do not look quite right. This is because openCV doesn’t store images in RGB format but rather BGR format. In order to convert to the standard RGB color space we can use the following code to split our image into its various channels and then we merge those chanels in the proper order.

b, g, r = cv2.split(img) #Split into various channels
merged = cv2.merge([r, g, b]) #Merge in RGB order
plt.imshow(merged)

The shorthand for doing this color conversion in openCV is shown below.

opencv_merged = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(opencv_merged)

Next, we will try to get an image from a video and use our RetinaFace (provided in our data folder on github) detection coordinates in order to crop out the face contained in a single frame of that video. For this example we pick frame 30 of the video which roughly corresponds to the image used above.

First, we will load in the entire video and capture the image corresponding to the exact frame we need. We effectively use the VideoCapture function in order to read in the video and then we use the set function in the VideoCapture class to set the next frame to read. After we do this then we can call the read function to capture the proper image from that frame in the video.

cap = cv2.VideoCapture('T002_ActionsShorter_mini_3239_3347_Use-Radio-or-Gadget.mp4')
total_frames = cap.get(7) # Gets the total frames from our video
cap.set(1, 30) # Sets next frame to read 
ret, frame = cap.read() # Reads next frame
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # Convert frame to RGB
plt.imshow(frame_rgb)

Now that we have the desired frame we can use our RetinaFace coordinates to crop out the part of the frame containing the face. We are interested in the x, y, w, h values for each detection as these can be used to create a bounding box or capture a smaller part of the image. The detection coordiantes we have gathered are roughly the below values:

x = 174
y = 65
w = 137
h = 194

Notice that image slicing can be done using brackets and we can combine this with our above x, y, w, h values in the form image[y:y+height, x:x+width]

x, y, w, h = 174, 65, 137, 194
cropped = frame_rgb[y:y+h, x:x+w]
plt.imshow(cropped)

Video Loop

Now that we have seen the above example we will look at a loop that reads in a single video and goes through each frame, capturing the corresponding face from each. This basic structure is something that can be used throughout the task for analyzing video

import pandas as pd # This library can be useful for reading in the coordinates for RetinaFace csv files

Pandas is a Python library for data manipulation and analysis that offers data structures and operations for manipulating numberical tables and time series. This library may be useful outside of analyzing the RetinaFace detection coordinates but currently we are using it for analyzing the csv file containing our RetinaFace detection coordinates.

The code below reads in our csv file containing the detection coordinates and displays the first 5 rows of the table.

face_detections = pd.read_csv('T002_ActionsShorter_mini_3239_3347_Use-Radio-or-Gadget.csv')
face_detections.head()

Notice how each row corresponds to a separate frame in the video. We are interested in gathering the x, y, w, h column values for each frame so that we are able to use that when analyzing the video. Here is how we can get these values for quick reference.

coordinates = face_detections[['x', 'y', 'w', 'h']].round(0).astype(int)
coordinates.head()

This is the basic structure of a loop in opencv that iterates through the video and captures the face at each frame

face_detections = pd.read_csv('T002_ActionsShorter_mini_3239_3347_Use-Radio-or-Gadget.csv')
coordinates = face_detections[['x', 'y', 'w', 'h']]
coordinates = coordinates.round(0).astype(int) # Rounds the detection coordinates and converts to integers

cap = cv2.VideoCapture('T002_ActionsShorter_mini_3239_3347_Use-Radio-or-Gadget.mp4')
success, img = cap.read()
frame_number = 0
while success:
    x, y, w, h = coordinates.iloc[0] # Gets the coordinates and assign values to x, y, w, h
    face = img[y:y+h, x:x+w]
    # Do something with the face image
    success, img = cap.read() # Iterate to next frame
    frame_number += 1

More Materials

To learn more about openCV usage this link will be helpful. It contains a collection of 4 notebooks (the material in the first one has been covered here) that are useful in learning the basics of image processing, features in computer vision and cascade classification.

More openCV tutorials can be found here. These tutorials have a broader range in topics and can be a good reference when learning what openCV can do.