Better OCR With Deep Learning and Syntax Correction With HALCON

The two most recent milestones in the evolution of HALCON’s OCR were the automatic text reader in HALCON 12 and the deep-learning-based classifier in HALCON 13. The former made OCR significantly easier. No longer did you have to fine-tune the OCR to fit the properties of your text, let alone segment the text manually. The OCR became as easy and convenient to use as HALCON’s bar and data code readers.

With the introduction of the deep-learning-based classifier, the automatic text reader became significantly more robust. It found fewer false positives in difficult images, and the read rate increased as well.

A powerful feature from earlier generations of the OCR, however, was the ability to identify text that conforms to a certain format such as an expiration date. Here is how you can use the automatic text reader to find that date on a package:

First you locate the text using the deep-learning-based classifier:

create_text_model_reader(‘auto’, ‘Universal_Rej.occ’, TextModel)
find_text(Image, TextModel, TextResultID)

To find the expiration date you could do a regular expression match on all of the found text:

get_text_result (TextResultID, ‘class’, FoundText)
ExpDate := regexp_match(sum(FoundText), Expression)

If some characters are read incorrectly, however, you are better off using a different approach:

read_ocr_class_cnn(‘Universal_Rej.occ’, OCRHandle)
get_text_object (TextRegions, TextResultID, ‘all_lines’)
Expression := ‘[0-9]/[0-9]’
do_ocr_word_cnn(TextRegions, Image, OCRHandle, Expression, 3, 2, Class, Confidence, Word, Score)
ExpDate := regexp_match(Word, Expression)

do_ocr_word_cnn automatically chooses the next best match for each character, if the first read attempt failed to return a result that matches the provided syntax definition. So effectively we are using the automatic text reader to locate the text, and then we are reading it again with integrated syntax correction. For this step you would also be able to use your own font.