Analyzer
k:PROPERTIES: :ID: E4039A88-BE80-4C42-9CA9-D3D504752ED3 :DRILL_LAST_INTERVAL: -1.0 :DRILL_REPEATS_SINCE_FAIL: 1 :DRILL_TOTAL_REPEATS: 1 :DRILL_FAILURE_COUNT: 1 :DRILL_AVERAGE_QUALITY: 1.0 :DRILL_EASE: 2.5 :NEXT_REVIEW: :MATURITY: seedling :LAST_REVIEW:
:END:
- tags
- Elasticsearch, OpenSearch, Analysis
Summary #
ref How Elasticsearch(and full text search) works?, book, evernote
In a nutshell an analyzer is used to tell Elasticsearch, OpenSearch how the text should be indexed and searched.
Analyzer is a wrapper which wraps three functions:
Character filter: Mainly used to strip off some unused characters or change some characters.
Tokenizer: Breaks a text into individual tokens(or words) and it does that based on certain factors(whitespace, ngram etc).
Token filter: It receives the tokens and then apply some filters(example changing uppercase terms to lowercase).

Elasticsearch #
ref, es The analyzer parameter specifies the analyzer used for text analysis when indexing or searching a text field.
Unless overridden with the search_analyzer mapping parameter, this analyzer is used for both index and search analysis. See Specify an analyzer.
OCR of Images #
2024-05-01_16-47-13_screenshot.png #

Input string/text Character Filters lokenizers Analysis Token Tilters tokens To inverted index