Interactive Deep Learning Training Framework

HALCON’s Deep-Learning (DL) is an impressive technology for identifying image content. In order to get robust inference results, we recommend using around 300-400 images per class. This can be an issue, however, if you want to develop a simple DL application.

For such a use case, a simple framework may consist of three scripts only:

Acquisition of the Dataset

This script grabs and displays images from a 2D colour camera. Images can be stored easily, e.g., by clicking in the graphics window. For a better user experience, the images which should be written to hard disk will be stored in a MessageQueue to prevent slowing down the acquisition and display processes while saving an image.

Training of the Deep Learning Network

The training should be pretty similar to the one in standard examples classify_fruit_deep_learning.hdev and classify_pill_defects_deep_learning.hdev. It uses the images from the previous step to train and allows the trained classifier to be stored on the hard drive. Additional data regarding the progress may be shown during the execution or at the end. All values should decrease with increasing iteration size. Please note that the amount of iterations depends on the configuration of the script and the used images (see fig. 1).

Fig. 1: Decreasing values with increasing iteration size

 

 

 

 

 

 

 

 

 

 

Application of a Trained Network with Live Image Acquisition

The third script acquires images similarly to the first script and uses the previously trained network to identify the image content. The results will be shown in the graphics window (see fig. 2).

Fig. 2: Application running on embedded board

 

 

 

 

 

 

 

 

 

 

 

We implemented these concepts to identify two embedded boards in front of the camera and additionally train the network to identify a hand and the background as well. The framework allows for a very flexible execution: it is recommended to perform the training on a performant platform; however, for inference, less performant platforms can also be employed. In our setup, we used a state-of-the-art gaming notebook for the first two scripts, transferred the network to an embedded platform and executed the inference on the same.

Fig. 3: Overview of the approach (Click to enlarge)

 

 

 

 

 

 

 

Fig. 4: Complete setup including acquisition hardware

 

 

 

 

 

 

 

 

 

Please make sure that the same setup is used for script 1 and 3, otherwise, the appearance of the training may strongly differ from the application images, resulting in a less robust inference.

Of course, it is also possible to run all scripts on a single system, e.g., a notebook and a webcam. If you have further questions, our support team is happy to assist you.

A simple program that shows the method can be downloaded here.