Spacy

Spacy

April 28, 2024 | permanent

Python Apps #

tags
Python, NLP, AI

Industrial-Strength Natural Language Processing IN PYTHON

Features #

  1. Support for 72+ languages
  2. 80 trained pipelines for 24 languages
  3. Multi-task learning with pretrained transformers like BERT
  4. Pretrained word vectors
  5. State-of-the-art speed
  6. Production-ready training system
  7. Linguistically-motivated tokenization
  8. Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more
  9. Easily extensible with custom components and attributes
  10. Support for custom models in Pytorch, TensorFlow and other frameworks
  11. Built in visualizers for syntax and NER
  12. Easy model packaging, deployment and workflow management
  13. Robust, rigorously evaluated accuracy

NER types or categories #

github, ref

All #

PERSON #

  • Description: People, including fictional characters

NORP #

  • Description: Nationalities or religious or political groups

FAC #

  • Description: Buildings, airports, highways, bridges, etc.

ORG #

  • Description: Companies, agencies, institutions, etc.

GPE #

  • Description: Countries, cities, states

LOC #

  • Description: Non-GPE locations, mountain ranges, bodies of water

PRODUCT #

  • Description: Objects, vehicles, foods, etc. (Not services)

EVENT #

  • Description: Named hurricanes, battles, wars, sports events, etc.

WORK_OF_ART #

  • Description: Titles of books, songs, etc.

LAW #

  • Description: Named documents made into laws

LANGUAGE #

  • Description: Any named language

DATE #

  • Description: Absolute or relative dates or periods

TIME #

  • Description: Times smaller than a day

PERCENT #

  • Description: Percentage, including “%” symbol

MONEY #

  • Description: Monetary values, including unit

QUANTITY #

  • Description: Measurements, as of weight or distance

ORDINAL #

  • Description: “first”, “second”, etc.

CARDINAL #

  • Description: Numerals that do not fall under another type

Relevant for Semantic Search as filters #

ORG #

PERSON #

GPE #

LANGUAGE #

LOC #

NORP #

EVENT #

WORK_OF_ART #

LAW #

FAC #

PRODUCT #

MONEY #

PERCENT #

DATE #

TIME #

QUANTITY #

OpenSearch query to fetch all the unique ners #

GET /docs-bge-large-en/_search

  "size": 0,
  "aggs":
    "unique_ORG":
      "terms":
        "field": "metadata.ner_dict.ORG.keyword",
        "size": 50

    ,
      "unique_PERSON":
      "terms":
        "field": "metadata.ner_dict.PERSON.keyword",
        "size": 50

    ,
      "unique_GPE":
      "terms":
        "field": "metadata.ner_dict.GPE.keyword",
        "size": 50

    ,
      "unique_LANGUAGE":
      "terms":
        "field": "metadata.ner_dict.LANGUAGE.keyword",
        "size": 50

    ,
      "unique_LOC":
      "terms":
        "field": "metadata.ner_dict.LOC.keyword",
        "size": 50

        ,
      "unique_NORP":
      "terms":
        "field": "metadata.ner_dict.NORP.keyword",
        "size": 50

    ,
    "unique_EVENT":
      "terms":
        "field": "metadata.ner_dict.EVENT.keyword",
        "size": 50

    ,
      "unique_WORK_OF_ART":
      "terms":
        "field": "metadata.ner_dict.WORK_OF_ART.keyword",
        "size": 50

    ,
      "unique_LAW":
      "terms":
        "field": "metadata.ner_dict.LAW.keyword",
        "size": 50

    ,
      "unique_FAC":
      "terms":
        "field": "metadata.ner_dict.FAC.keyword",
        "size": 50

    ,
      "unique_PRODUCT":
      "terms":
        "field": "metadata.ner_dict.PRODUCT.keyword",
        "size": 50

    ,
      "unique_MONEY":
      "terms":
        "field": "metadata.ner_dict.MONEY.keyword",
        "size": 50

    ,
      "unique_PERCENT":
      "terms":
        "field": "metadata.ner_dict.PERCENT.keyword",
        "size": 50

    ,
      "unique_DATE":
      "terms":
        "field": "metadata.ner_dict.DATE.keyword",
        "size": 50

    ,
      "unique_TIME":
      "terms":
        "field": "metadata.ner_dict.TIME.keyword",
        "size": 50

    ,
      "unique_QUANTITY":
      "terms":
        "field": "metadata.ner_dict.QUANTITY.keyword",
        "size": 50

Transformer models comparison #

ref

en_core_web_sm #

Small and fast

en_core_web_lg #

Big and ok

en_core_web_trf #

Best: This is as good as NER gets without any additional tuning or retraining. This saved reporters days of work.

results:

OCR of Images #

2024-04-28_21-47-57_screenshot.png #

Re-appointment of CONSTANCE SUGIYAMA PERSON C.M., of Toronto GPE Ontario GPE as a director of the Board of Directors ORG of the Asia-Pacific Foundation of Canada ORG to hold office during pleasure for a term of three years DATE effective July 4, 2022 DATE Re-appointment of LISA DE WILDE PERSON C.M., of Oakville GPE Ontario GPE as a director of the Board of Directors ORG of the Asia-Pacific Foundation of Canada ORG to hold office during pleasure for a term of three years DATE effective July 4, 2022 DATE Re-appointment of the HONOURABLE PIERRE S. PETTIGREW PERSON P.C. GPE of Toronto GPE Ontario GPE as Chairperson of the Board of Directors ORG of the Asia-Pacific Foundation of Canada ORG to hold office during pleasure for a term of three years DATE effective July 1, 2022 DATE Approval of the appointment by the Minister of Housing and Diversity and Inclusion of CHRISTOPHER F. SICOTTE PERSON of Saskatoon GPE Saskatchewan GPE to be a director of the Board of Directors ORG of the Canada Mortgage and Housing Corporation ORG to hold office during pleasure, on a part-time basis, for a term of four years DATE


Links to this note

Go to random page

Previous Next