Spacy
April 28, 2024 |
permanent
Python Apps #
Industrial-Strength Natural Language Processing IN PYTHON
Features #
- Support for 72+ languages
- 80 trained pipelines for 24 languages
- Multi-task learning with pretrained transformers like BERT
- Pretrained word vectors
- State-of-the-art speed
- Production-ready training system
- Linguistically-motivated tokenization
- Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more
- Easily extensible with custom components and attributes
- Support for custom models in Pytorch, TensorFlow and other frameworks
- Built in visualizers for syntax and NER
- Easy model packaging, deployment and workflow management
- Robust, rigorously evaluated accuracy
NER types or categories #
All #
PERSON #
- Description: People, including fictional characters
NORP #
- Description: Nationalities or religious or political groups
FAC #
- Description: Buildings, airports, highways, bridges, etc.
ORG #
- Description: Companies, agencies, institutions, etc.
GPE #
- Description: Countries, cities, states
LOC #
- Description: Non-GPE locations, mountain ranges, bodies of water
PRODUCT #
- Description: Objects, vehicles, foods, etc. (Not services)
EVENT #
- Description: Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART #
- Description: Titles of books, songs, etc.
LAW #
- Description: Named documents made into laws
LANGUAGE #
- Description: Any named language
DATE #
- Description: Absolute or relative dates or periods
TIME #
- Description: Times smaller than a day
PERCENT #
- Description: Percentage, including “%” symbol
MONEY #
- Description: Monetary values, including unit
QUANTITY #
- Description: Measurements, as of weight or distance
ORDINAL #
- Description: “first”, “second”, etc.
CARDINAL #
- Description: Numerals that do not fall under another type
Relevant for Semantic Search as filters #
ORG #
PERSON #
GPE #
LANGUAGE #
LOC #
NORP #
EVENT #
WORK_OF_ART #
LAW #
FAC #
PRODUCT #
MONEY #
PERCENT #
DATE #
TIME #
QUANTITY #
OpenSearch query to fetch all the unique ners #
GET /docs-bge-large-en/_search
"size": 0,
"aggs":
"unique_ORG":
"terms":
"field": "metadata.ner_dict.ORG.keyword",
"size": 50
,
"unique_PERSON":
"terms":
"field": "metadata.ner_dict.PERSON.keyword",
"size": 50
,
"unique_GPE":
"terms":
"field": "metadata.ner_dict.GPE.keyword",
"size": 50
,
"unique_LANGUAGE":
"terms":
"field": "metadata.ner_dict.LANGUAGE.keyword",
"size": 50
,
"unique_LOC":
"terms":
"field": "metadata.ner_dict.LOC.keyword",
"size": 50
,
"unique_NORP":
"terms":
"field": "metadata.ner_dict.NORP.keyword",
"size": 50
,
"unique_EVENT":
"terms":
"field": "metadata.ner_dict.EVENT.keyword",
"size": 50
,
"unique_WORK_OF_ART":
"terms":
"field": "metadata.ner_dict.WORK_OF_ART.keyword",
"size": 50
,
"unique_LAW":
"terms":
"field": "metadata.ner_dict.LAW.keyword",
"size": 50
,
"unique_FAC":
"terms":
"field": "metadata.ner_dict.FAC.keyword",
"size": 50
,
"unique_PRODUCT":
"terms":
"field": "metadata.ner_dict.PRODUCT.keyword",
"size": 50
,
"unique_MONEY":
"terms":
"field": "metadata.ner_dict.MONEY.keyword",
"size": 50
,
"unique_PERCENT":
"terms":
"field": "metadata.ner_dict.PERCENT.keyword",
"size": 50
,
"unique_DATE":
"terms":
"field": "metadata.ner_dict.DATE.keyword",
"size": 50
,
"unique_TIME":
"terms":
"field": "metadata.ner_dict.TIME.keyword",
"size": 50
,
"unique_QUANTITY":
"terms":
"field": "metadata.ner_dict.QUANTITY.keyword",
"size": 50
Transformer models comparison #
en_core_web_sm #
Small and fast
en_core_web_lg #
Big and ok
en_core_web_trf #
Best: This is as good as NER gets without any additional tuning or retraining. This saved reporters days of work.
results:

OCR of Images #
2024-04-28_21-47-57_screenshot.png #

Re-appointment of CONSTANCE SUGIYAMA PERSON C.M., of Toronto GPE Ontario GPE as a director of the Board of Directors ORG of the Asia-Pacific Foundation of Canada ORG to hold office during pleasure for a term of three years DATE effective July 4, 2022 DATE Re-appointment of LISA DE WILDE PERSON C.M., of Oakville GPE Ontario GPE as a director of the Board of Directors ORG of the Asia-Pacific Foundation of Canada ORG to hold office during pleasure for a term of three years DATE effective July 4, 2022 DATE Re-appointment of the HONOURABLE PIERRE S. PETTIGREW PERSON P.C. GPE of Toronto GPE Ontario GPE as Chairperson of the Board of Directors ORG of the Asia-Pacific Foundation of Canada ORG to hold office during pleasure for a term of three years DATE effective July 1, 2022 DATE Approval of the appointment by the Minister of Housing and Diversity and Inclusion of CHRISTOPHER F. SICOTTE PERSON of Saskatoon GPE Saskatchewan GPE to be a director of the Board of Directors ORG of the Canada Mortgage and Housing Corporation ORG to hold office during pleasure, on a part-time basis, for a term of four years DATE