Identification Measurement Notebook

Introduction

The goal of this notebook is to explore a potential judging method which computes errors on a video-by-video basis using the follow methodology:

  1. Find principal components of original video

  2. Project the original video and de-identified video onto the subspace spanned by the principal components

  3. Compute the distance between the (projected) original and de-identified videos

If the above concepts are confusing don’t worry as they will be explored more in depth later on in this notebook. This method is not intended to be used for de-identifying video but rather to explore a method by which the de-identified videos can be compared against the original video.

NOTE: This method will not necessarily be used in the actual judging process for this competition

Libraries to import

For this notebook the following libraries will be used: numpy, pandas and opencv. If any of these libraries sound unfamiliar it is recommended to review the first Jupyter Notebook tutorial (Getting Started) which can be found here.

import numpy as np
import pandas as pd
import cv2

These two global variables reference a video from the “Level 1” folder which contains a short driver video containing the Dance/Sing action. The referenced csv file simply provides face detection coordinates computed using RetinaFace which is a deep learning based cutting-edge facial detector for Python coming with facial landmarks. The referenced landmarks are stored in this csv file which corresponds to the video that will be analyzed.

# Global variables

video_path = "Video/Easy/T002_ActionsShorter_mini_11552_12166_Dance-Sing.mp4"
retina_csv_path = "RetinaFaceDetections/Easy/T002_ActionsShorter_mini_11552_12166_Dance-Sing.csv"

The following code block simply reads in the x, y, w, h coordinates from the RetinaFace csv file and stores the coordinates in a pandas dataframe with each row corresponding to a separate frame in the video.

def returnRetinaDf(retina_csv_path):
    """
    Read retina detections csv file into dataframe
    
    Input: file path to csv
    """
    retina_df = pd.read_csv(retina_csv_path)
    retina_df = retina_df[["x", "y", "w", "h"]]
    return retina_df
retina_df = returnRetinaDf(retina_csv_path)
retina_df.head(5)

Example Submissions

As mentioned above, this particular notebook tutorial requires some basic example submissions in order to test against the original video. The following methods are being used: 1) pixelate, 2) sharpen_avg, 3) median_avg. Each of these methods are explored more in depth below.

# example submission 
def pixelate(img, size):
    height, width = img.shape[:2]
    w, h = (size, size)
    temp = cv2.resize(img, (w, h), interpolation=cv2.INTER_LINEAR)
    print(temp.shape)
    output = cv2.resize(temp, (width, height), interpolation=cv2.INTER_NEAREST)
    return output
# example submission
def sharpen_avg(img):
    # sharpening filter
    img = cv2.addWeighted(img, 4, cv2.blur(img, (30, 30)), -4, 128)
    # averaging kernel; apply vertical kernel, then horizontal kernel
    img = cv2.blur(img, (20, 10))
    img = cv2.blur(img, (10, 20))
    return img
# example submissions
def median_avg(img):
    # median avg filter
    return cv2.medianBlur(img, 27)

Now that the 3 example submission masking techniques have been defined, we will test each method on a singular frame of the video as a sanity check to make sure everything works as intended and to help visualize each of the above masking methods.

As mentioned in the Getting Started notebook, we need to use plt.imshow to display images but when working outside of a jupyter notebook the opencv library provides a function (opencv.imshow) that can be used to open a window containing the image. Because openCV stores images in BGR format, the image must also be converted to the standard RGB color space so that it is displayed in matplotlib with the proper color scheme.

# Get first frame from sample video
from matplotlib import pyplot as plt

cap = cv2.VideoCapture(video_path)
success, image = cap.read()

if success:
    b, g, r = cv2.split(image)
    image = cv2.merge([r, g, b])
    plt.imshow(image)

Pixelate Image

For the pixelate function, we will walk through the code step by step since it is not using a opencv defined function but rather a clever use of multiple resize operations.

  1. First, we get the input image size, which in this case is 480X480 (computed using image.shape)

  2. Next, we decide on a desired “pixelated size”

  3. The image is then resized to the desired pixelated size using bilinear interpolation

  4. Finally, the image is then reesized back to the output size using nearest neighbor interpolation. It is recommended to read the linked article if you are unfamiliar with this interpolation method to understand how the pixelated look is retained even though the image is being resized back to the original size.

After the above steps have been performed, we have an input image that has been pixelated while retaining the original image size.

# Get image height and width
height, width = image.shape[:2]

# Set w and h values to our desired resize value (this is intered in the pixelate method as parameter "size")
w, h = (16, 16)

# Display temp image
temp = cv2.resize(image, (w, h), interpolation=cv2.INTER_LINEAR)

# Convert the pixelated image back to the original size
output = cv2.resize(temp, (width, height), interpolation=cv2.INTER_NEAREST)

Below, we display the temp image which represents the 16x16 pixelated image

plt.imshow(temp)
temp.shape

Here, the final output image is displayed. Notice how this image looks identitical to the 16x16 pixelated image but has the original input image shape.

plt.imshow(output)
output.shape

Sharpening and Applying Averaging Kernel

Here, we are first applying a sharpening filter to the image and then applying both a vertical and horizontal averaging kernel to the image. The code block below will show the image at various points in this process but it is recommended to read this article to get a better understanding of how this process actually works to mask the image.

# sharpening filter
sharpen = cv2.addWeighted(image, 4, cv2.blur(image, (30, 30)), -4, 128)

# averaging kernel; apply vertical kernel, then horizontal kernel
output = cv2.blur(sharpen, (20, 10))
output = cv2.blur(sharpen, (10, 20))
plt.imshow(sharpen)
sharpen.shape
plt.imshow(output)
output.shape

Median Blurring

This method takes a median of all the pixels under the kernel area and replaces the central element and can be implemented in python by simply using the cv2.medianBlur function. This article should help clarify and questions about how median blur and other similar smoothing filters actually work and also shows how to implement them in python.

Notice that this method produces a blurred version of the original image and is a reasonable masking technique to use as a third example submission.

output = cv2.medianBlur(image, 27)
plt.imshow(output)
output.shape

Putting it all together

Now that we have defined our 3 masking techniques and tested them on a singular video frame we need to create a function that will iterate through all the frames in the input video, applying the masking techniques at each frame. Once the masking techniques are applied at each frame, the below function will then append each transformed frame to an output matrix (one for each masking technique) where each matrix contains all the frames for the entire video.

def returnDataMatrix(video_path):
    
    """
    Takes in video path and returns data matrix
    """
    
    cap = cv2.VideoCapture(video_path)
    frame_counter = 0

    data_matrix = []
    
    ##### Remove this ##################################################################
    pixelated_matrix = []
    sharpen_blur_matrix = []
    median_matrix = []
    ###################################################################################

    while cap.isOpened():

        ret,frame = cap.read()
        if frame is None:
            break


        x = int(retina_df["x"][frame_counter])
        y = int(retina_df["y"][frame_counter])
        l = max(int(retina_df["w"][frame_counter]), int(retina_df["h"][frame_counter]))

        frame = frame[y:y + l, x: x + l]
        frame = cv2.resize(frame, (224, 224), interpolation = cv2.INTER_AREA)
        
        ##### Remove this #############################################################################
        pixelated_frame = pixelate(frame, 65)
        sharpen_blur_frame = sharpen_avg(frame)
        median_frame = median_avg(frame)
        #cv2.imwrite("pixelated.jpg", pixelated_frame)
        #cv2.imwrite("sharpen_blur.jpg", sharpen_blur_frame)
        #cv2.imwrite("median.jpg", median_frame)
        ##############################################################################################
        
        
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        frame = [j for i in frame for j in i]
        
        ##### Remove this #############################################################################
        pixelated_frame = cv2.cvtColor(pixelated_frame, cv2.COLOR_BGR2GRAY)
        #cv2.imwrite("bw_pixelated.jpg", pixelated_frame)
        pixelated_frame = [j for i in pixelated_frame for j in i]
        
        sharpen_blur_frame = cv2.cvtColor(sharpen_blur_frame, cv2.COLOR_BGR2GRAY)
        #cv2.imwrite("bw_sharpen_blur.jpg", sharpen_blur_frame)
        sharpen_blur_frame = [j for i in sharpen_blur_frame for j in i]
        
        median_frame = cv2.cvtColor(median_frame, cv2.COLOR_BGR2GRAY)
        #cv2.imwrite("bw_median.jpg", median_frame)
        median_frame = [j for i in median_frame for j in i]
        ##############################################################################################

        data_matrix.append(frame)
        
        ##### Remove this #############################################################################
        pixelated_matrix.append(pixelated_frame)
        sharpen_blur_matrix.append(sharpen_blur_frame)
        median_matrix.append(median_frame)
        ##############################################################################################

        frame_counter += 1


    cap.release()

    data_matrix = np.array(data_matrix)
    #### Remove this ###########################
    pixelated_matrix = np.array(pixelated_matrix)
    sharpen_blur_matrix = np.array(sharpen_blur_matrix)
    median_matrix = np.array(median_matrix)
    #######################################################
    return data_matrix, pixelated_matrix, sharpen_blur_matrix, median_matrix # change to: return data_matrix
data_matrix, pixelated_matrix, sharpen_blur_matrix, median_matrix = returnDataMatrix(video_path)
data_matrix.shape
pixelated_matrix.shape
sharpen_blur_matrix.shape
median_matrix.shape

Scikit Learn Library

Scikit learn is a machine learning library in Python that can be used for classification, regerssion, clustering, dimensionality reduction, model selection and preprocessing.

from sklearn.decomposition import PCA
from sklearn.decomposition import SparsePCA
from sklearn.decomposition import TruncatedSVD

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a linear dimensionality reduction technique that can be utilized for extracting information from a high-dimensional space by projecting it into a lower-dimensional sub-space. It tries to preserve the essential parts that have more variation of the data and remove the non-essential parts with fewer variation.

The function below uses the sklearn library to perform Principal Component Analysis (PCA) and then computes the error between the original and de-identified videos. This function needs to be called for each of our matrices (i.e. pixelated matrix, sharpen matrix and median blur matrix) and we output the error below.

def pca_error(matrix, de_identified_matrix):
    
    """
    matrix: corresponds to original video
    de_identified_matrix: correponds to de-identified video
    """
    
    pca = PCA(n_components=min(matrix.shape[0], matrix.shape[1]), svd_solver='full')
    pca.fit(matrix)
    
    dim_data_matrix = pca.transform(matrix)
    dim_de_identified_matrix = pca.transform(de_identified_matrix)
    
    return np.linalg.norm(dim_de_identified_matrix - dim_data_matrix)  
print("pixelated error: " + str(pca_error(data_matrix, pixelated_matrix)))
print("sharpen blur error: " + str(pca_error(data_matrix, sharpen_blur_matrix)))
print("median error: " + str(pca_error(data_matrix, median_matrix)))

Sparse PCA

One of the main disadvantages of ordinary PCA is that the principal components are usually linear combinations of all the input variables. Sparse PCA can help overcome this disadvantage by finding linear combinations that contain just a few input variables.

def sparse_pca_error(matrix, matrix1, matrix2, matrix3):
    
    """
    matrix: corresponds to original video
    matrix1, matrix2, matrix3, .... : correspond to de-identified videos
    """
    
    svd = TruncatedSVD(n_components = int(min(matrix.shape[0], matrix.shape[1]) * 0.75))
    svd.fit(matrix)
    matrix = svd.transform(matrix)
    print(matrix.shape)
    
    sparse_pca = SparsePCA(n_components = 5, alpha=1.5)
    sparse_pca.fit(matrix)
    dim_data_matrix = sparse_pca.transform(matrix)
    
    def compute_error(de_identified_matrix):
        
        svd = TruncatedSVD(n_components = int(min(de_identified_matrix.shape[0], de_identified_matrix.shape[1]) * 0.75))
        svd.fit(de_identified_matrix)
        de_identified_matrix = svd.transform(de_identified_matrix)
        
        dim_de_identified_matrix = sparse_pca.transform(de_identified_matrix)

        return np.linalg.norm(dim_de_identified_matrix-dim_data_matrix)
    
    return compute_error(matrix1), compute_error(matrix2), compute_error(matrix3)
pixelated_error, sharpen_blur_error, median_error = sparse_pca_error(data_matrix, pixelated_matrix, sharpen_blur_matrix, median_matrix)
print("pixelated error: " + str(pixelated_error))
print("sharpen blur error: " + str(sharpen_blur_error))
print("median error: " + str(median_error))

Conclusion

Hopefully the above example of using PCA to compare de-identified videos with the original videos provides some better intuition about what masking techniques may prove more effective than others as well as helping to showcase some more features of Python that may be useful in the competition. To get the full benefit of the notebook it is recommended to go back and read through any linked articles to gain a better understanding of all the concepts discussed.