What preprocessing operations are performed by Tesseract OCR? -
i couldn't find detailed documentation , don't feel browsing source code. want not redo canny edge detection example if done tesseract engine.
this document provides overview of engine: https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf
so looks don't need implement canny edge detection.
tesseract uses otsu thresholding binarize image before processing https://github.com/tesseract-ocr/tesseract/blob/master/ccstruct/otsuthr.h
edit: if want see binarized image create new config file in "\tessdata\configs\", add line: tessedit_write_images true
, process image: tesseract your_image out your_config_file
. tesseract saves binarized image tessinput.tif
.
Comments
Post a Comment