AUTOMATIC THUMBNAIL SELECTION

Kyle Pretorius, a master’s degree student, developed a system for automatic thumbnail selection using genetic programming and convolutional neural networks. This entailed feeding a movie trailer into the system and playing the video frames one after each other. The ML system then scores the frames based on how good the image would be as a thumbnail, which can be used in DStv’s menu of what to watch. Once processed, the top five frames are ranked, from which one can select the most suitable one to use. By automating the thumbnail selection process, as opposed to generating it manually, benefits of time and cost-savings are achieved.

WHAT IS AUTOMATIC THUMBNAIL SELECTION?

Thumbnails are cover images that are used to represent a video or attract viewers. The process of manually selecting thumbnails for movies or series can take up time and resources. ML can be used to automate the system of selecting thumbnails.

The problem can be reduced to a binary classification problem, in which the following needs to be done:

Create a dataset consisting of good and bad examples of thumbnails
Train a classifier to assign scores to examples, where higher scores indicate that the example is likely to be a better thumbnail
Use the trained classifier to assign scores to frames sampled from the video of interest
Select or suggest the frames with the highest scores as thumbnails for the video

COMPARISON OF CLASSIFICATION APPROACHES

The main aim of this project was to compare two image classification approaches when applied to automatic thumbnail selection.

The following two approaches were used:

Convolutional neural networks (CNN): the state-of-the-art method for image classification that has been successfully applied to thumbnail selection before.
Novel hybrid genetic programming: a new method proposed for this project utilising the strengths of convolutional neural networks to enable genetic programming to create classifiers for images.

THE HYBRID GENETIC PROGRAMMING APPROACH

Genetic programming is an evolutionary approach that aims to search for a program that can solve the problem at hand. It has been found to perform well when used to evolve classifiers that operate on a small number of input features. However, genetic programming struggles when applied to images due to the larger number of features and spatial correlation between pixels. The proposed hybrid programming approach aims to use the convolutional layers from a pre-trained CNN to extract features from the image.

Once images have been processed by the convolutional layers, a one-dimensional feature vector remains. The features within this vector have fewer features than in the original image, and have reduced spatial correlation. Genetic programming can now be used to evolve classifiers that operate on the extracted features, thereby removing the need to operate on images directly.

RESULTS

It was found that CNNs outperformed the proposed hybrid genetic programming approach in terms of classification accuracy and loss. This was likely due to the fact that the extracted feature vector was still too large for genetic programming to efficiently evolve classifiers (a minimum of 512 features were extracted). CNNs were also able to more consistently select suitable thumbnails.

The top three thumbnails were selected using CNN for two trailers: “Thor Ragnarok” and “Jumanji: the Next Level”. The thumbnails are shown with the preference decreasing from left to right.

DATASET

A dataset consisting of good and bad examples of thumbnails had to be created for this project. This was done using YouTube’s developers’ application programming interface (API). Using this API, lists of video identifiers of trailers for movies and series could be obtained.

For each video identifier, examples of thumbnails were obtained as follows:

Good examples: YouTube’s automatic thumbnail selection algorithm selected three thumbnails.
Bad examples: Three random frames were selected within a video (the frames that were selected have a high probability of making a bad thumbnail)

A dataset comprising roughly 2 500 examples was created using this approach, with examples equally balanced between the classes.

CONVOLUTIONAL NEURAL NETWORKS

CNNs are regarded as the state-of-the-art approach for image analysis. They differ from standard deep neural networks by adding convolutional layers at the start of the network. These convolutional layers allow CNNs to operate more efficiently on data that consists of features with spatial relationships, such as images.

The aim of this component of the project was to compare the performance of well-known CNN architectures when applied to thumbnail selection.

It was found that a modified version of ResNet-50 produced the best results in terms of classification accuracy on the created thumbnail dataset.

Share this page