DocTR

November 20, 2024 | seedling, permanent

tags :

similar: Paddle OCR, Abbyy OCR, Open Source

Deep Learning OCR #

Acronym: docTR (Document Text Recognition) A seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Mindee docTR - Probably the Best Open-Source OCR, youtube

My experience with it #

I have used it in my blog to extract texts from the notes images, results were awesome!.

Converting geometry to graphical coordinate system #

ref

import math

def convert_coordinates(geometry, page_dim):
    len_x = page_dim[1]
    len_y = page_dim[0]
    (x_min, y_min) = geometry[0]
    (x_max, y_max) = geometry[1]
    x_min = math.floor(x_min * len_x)
    x_max = math.ceil(x_max * len_x)
    y_min = math.floor(y_min * len_y)
    y_max = math.ceil(y_max * len_y)
    return [x_min, x_max, y_min, y_max]

def get_coordinates(output):
    page_dim = output['pages'][0]["dimensions"]
    text_coordinates = []
    for obj1 in output['pages'][0]["blocks"]:
        for obj2 in obj1["lines"]:
            for obj3 in obj2["words"]:
                converted_coordinates = convert_coordinates(
                                           obj3["geometry"],page_dim
                                          )
                print(": ".format(converted_coordinates,
                                      obj3["value"]
                                      )
                     )
                text_coordinates.append(converted_coordinates)
    return text_coordinates


graphical_coordinates = get_coordinates(output)


import PIL
from PIL import ImageDraw
import matplotlib.pyplot as plt
def draw_bounds(image, bound):
    draw = ImageDraw.Draw(image)
    for b in bound:
        p0, p1, p2, p3 = [b[0],b[2]], [b[1],b[2]], \
                         [b[1],b[3]], [b[0],b[3]]
        draw.line([*p0,*p1,*p2,*p3,*p0], fill='blue', width=2)
    return image
image = PIL.Image.open(img_path)
result_image = draw_bounds(image, graphical_coordinates)
plt.figure(figsize=(15,15))
plt.imshow(result_image)

Installation on macOS Apple M2 Pro, silicon #

pip   install python-doctr[torch]
# pango
brew install pango harfbuzz

# doctr
export DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/harfbuzz/10.1.0/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/fontconfig/2.15.0/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/pango/1.54.0/lib:$DYLD_LIBRARY_PATH

DocTR

Deep Learning OCR #

My experience with it #

Converting geometry to graphical coordinate system #

Installation on macOS Apple M2 Pro, silicon #

Links to this note