DocTR
tags :
- similar
- Paddle OCR, Abbyy OCR, Open Source
Deep Learning OCR #
Acronym: docTR (Document Text Recognition) A seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Mindee docTR - Probably the Best Open-Source OCR, youtube
My experience with it #
I have used it in my blog to extract texts from the notes images, results were awesome!.
Converting geometry to graphical coordinate system #
import math
def convert_coordinates(geometry, page_dim):
len_x = page_dim[1]
len_y = page_dim[0]
(x_min, y_min) = geometry[0]
(x_max, y_max) = geometry[1]
x_min = math.floor(x_min * len_x)
x_max = math.ceil(x_max * len_x)
y_min = math.floor(y_min * len_y)
y_max = math.ceil(y_max * len_y)
return [x_min, x_max, y_min, y_max]
def get_coordinates(output):
page_dim = output['pages'][0]["dimensions"]
text_coordinates = []
for obj1 in output['pages'][0]["blocks"]:
for obj2 in obj1["lines"]:
for obj3 in obj2["words"]:
converted_coordinates = convert_coordinates(
obj3["geometry"],page_dim
)
print(": ".format(converted_coordinates,
obj3["value"]
)
)
text_coordinates.append(converted_coordinates)
return text_coordinates
graphical_coordinates = get_coordinates(output)
import PIL
from PIL import ImageDraw
import matplotlib.pyplot as plt
def draw_bounds(image, bound):
draw = ImageDraw.Draw(image)
for b in bound:
p0, p1, p2, p3 = [b[0],b[2]], [b[1],b[2]], \
[b[1],b[3]], [b[0],b[3]]
draw.line([*p0,*p1,*p2,*p3,*p0], fill='blue', width=2)
return image
image = PIL.Image.open(img_path)
result_image = draw_bounds(image, graphical_coordinates)
plt.figure(figsize=(15,15))
plt.imshow(result_image)
Installation on macOS Apple M2 Pro, silicon #
pip install python-doctr[torch]
# pango
brew install pango harfbuzz
# doctr
export DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/harfbuzz/10.1.0/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/fontconfig/2.15.0/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/pango/1.54.0/lib:$DYLD_LIBRARY_PATH