paperless-ngx

paperless-ngx

May 25, 2024 | seedling, permanent

tags :

DMS #

Python based. features, youtube demo username: demo, pw: demo

installation with docker-compose, youtube installation with docker compose and demo, youtube blog, installation Uses Apache Tika

Features #

  • Organize and index your scanned documents with tags, correspondents, types, and more.
  • Your data is stored locally on your server and is never transmitted or shared in any way.
  • Performs OCR on your documents, adding searchable and selectable text, even to documents scanned with only images.
  • Utilizes the Open Source Tesseract engine to recognize more than 100 languages.
  • Documents are saved as PDF/A (PDF/A-3) format which is designed for long term storage, alongside the unaltered originals.
  • Uses Machine Learning to automatically add tags, correspondents and document types to your documents.
  • Supports PDF documents, images, plain text files, Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents)1 and more.
  • Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely with different configurations assigned to different documents.
  • Beautiful, modern web application that features:
  • Customizable dashboard with statistics.
  • Filtering by tags, correspondents, types, and more.
  • Bulk editing of tags, correspondents, types and more.
  • Drag-and-drop uploading of documents throughout the app.
  • Customizable views can be saved and displayed on the dashboard and / or sidebar.
  • Support for custom fields of various data types.
  • Shareable public links with optional expiration.
  • Full Text Search helps you find what you need:
  • Auto completion suggests relevant words from your documents.
  • Results are sorted by relevance to your search query.
  • Highlighting shows you which parts of the document matched the query.
  • Searching for similar documents (“More like this”)
  • Email processing1: import documents from your email accounts:
  • Configure multiple accounts and rules for each account.
  • After processing, paperless can perform actions on the messages such as marking as read, deleting and more.
  • A built-in robust multi-user permissions system that supports ‘global’ permissions as well as per document or object.
  • A powerful workflow system that gives you even more control.
  • Optimized for multi core systems: Paperless-ngx consumes multiple documents in parallel.
  • The integrated sanity checker makes sure that your document archive is in good health.

<2024-02-13 Tue>, #

github #

Very popular

dockerhub #

Very popular

papermerge vs paperless-ngx #

ref reddit, prefer ngx

Ansible Deployment #

Ansible playbooks to set up Paperless-ngx

Docker Deployment #

ref

docker compose with postgres

version: "3.4"
services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redisdata:/data

  db:
    image: docker.io/library/postgres:15
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
    ports:
      - "8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db

volumes:
  data:
  media:
  pgdata:
  redisdata:

dependencies #

Backend API #

backend rest api, ref

APIs #

The API provides the following main endpoints

endpoints #

  1. /api/correspondents Full CRUD support.
  2. api/custom_fields: Full CRUD support.
  3. api/documents: Full CRUD support, except POSTing new documents. See below.
  4. api/document_types: Full CRUD support.
  5. api/groups: Full CRUD support.
  6. api/logs: Read-Only.
  7. api/mail_accounts: Full CRUD support.
  8. api/mail_rules: Full CRUD support.
  9. api/profile: GET, PATCH
  10. api/share_links: Full CRUD support.
  11. api/storage_paths: Full CRUD support.
  12. api/tags: Full CRUD support.
  13. api/tasks: Read-only.
  14. api/users: Full CRUD support.
  15. api/workflows: Full CRUD support.

AuthZ #

ref Basic and Token are supported.

get document notes #

ref

  1. api/documents/<id>/notes: Retrieve notes for a document.
  2. api/documents/<id>/share_links: Retrieve share links for a document.

get meta data #

ref

/api/documents/<id>/metadata/

Access the metadata of a document with an ID id

POSTing or creating documents #

ref

  1. api/documents/post_document
    resp = httpx.post(host + "/api/documents/post_document/",
                    headers="Authorization": "Token " + token,
                    files='document': ('mypdf.pdf', open('mypdf.pdf', 'rb'), 'application/pdf'),
                    data="title": "My first upload", "created": "2023-12-21")
    

github

  1. api/search/autocomplete

Permissions #

ref

"owner": ...,
"set_permissions":
    "view":
        "users": [...],
        "groups": [...],
    ,
    "change":
        "users": [...],
        "groups": [...],
    ,

Download #

ref

  1. api/documents/<pk>/download: Download the document.
  2. api/documents/<pk>/preview: Display the document inline, without downloading it.
  3. api/documents/<pk>/thumb: Download the PNG thumbnail of a document.

bulk editing #

ref

  1. api/bulk_edit
    
    "documents": [LIST_OF_DOCUMENT_IDS],
    "method": METHOD, // see below
    "parameters": args // see below
    
  2. api/bulk_edit_objects
    
    "objects": [LIST_OF_OBJECT_IDS],
    "object_type": "tags", "correspondents", "document_types" or "storage_paths",
    "operation": "set_permissions" or "delete",
    "owner": OWNER_ID, // optional
    "permissions":  "view":  "users": [] ... , "change":  ...  , // (see 'set_permissions' format above)
    "merge": true / false // defaults to false, see above
    

Review #

sending scanned docs from printer to this directly

OCR of Images #

2024-02-13_17-45-09_screenshot.png #

github.com/paperless-ngx/paperiess-ngx paperless-ngx / paperless-ngx <> Code Issues 81 Pull requests 4 2) Discussions Actions Projects 1 - LL Wiki paperless-ngx Public Watch 91 4 Fork 782 Star 15.4k 8 dev V 8 9 Go to file + <> Code V About A community-supported supercharged version of paperless: scan, index and archive all your physical documents dependabot/bot) C. : 09ab694 5 hours ago - 8,894 Commits .github docker docs Chore: Backend dependen... feat: add env allowing pap... Merge remote-tracking br... New -ngx logo 2022 5 days ago 2 weeks ago 14 hours ago 2 years ago 8 months ago 14 hours ago 14 hours ago 3 months ago € docs.paperless-ngs.com pdf machine-learning django angular ocr archiving dms document-management optca-character-recogniton document-management-system resources scripts src-ui src Updates the default Postg... Resets version string Bumps version to 2.5.1 Readme A GPL-3.0 license  Code of conduct codecov.yml Chore: Restore codecov C...

2024-02-13_17-42-30_screenshot.png #

hub Q Sign In Sign up Explore I paperlessngy/papelessengs paperlessngy/papelessngs 17 Pulls 10M+ By paperlessngx - Updated 5 hours ago Image Overview Tags Sort by Newest - Filter Tags

2024-02-13_20-12-28_screenshot.png #

LICIL you CCIL SLUP uC CIIVILUILLLCILL usH6 UCACI CUILLpUsE uUVVil. root@paperless:/opt/paperless/paperless-ngxe sudo -Hu paperless docker compose down [+] Running 6/6 - Container paperless-webserver-1 Removed 6.9s - Container paperless-db-1 Removed 0.3s - Container paperless-tika-1 Removed 0.4s / Container paperless-gotenberg1 Removed 10.2s Container paperless-broker-1 Removed 0.4s Network paperless.default Removed 0.3s


Links to this note

Go to random page

Previous Next