OpenSearch

OpenSearch

November 20, 2024 | permanent

Software #

tags
Elasticsearch, Apache, Full Text Search

OpenSearch is a community-driven, Completely open source github

  • Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.
  • Developers build with OpenSearch for use cases such as

Features #

ref

  • OpenSearch search engine Elasticsearch
  • OpenSearch Dashboard Kibana

Additional features #

  1. Anomaly detection – Identify atypical data and receive automatic notifications
  2. KNN – Find “nearest neighbours” in your vector data
  3. Performance Analyzer – Monitor and optimise your cluster
  4. SQL – Use SQL or a piped processing language to query your data
  5. Index State Management – Automate index operations
  6. ML Commons plugin – Train and execute machine-learning models
  7. Asynchronous search – Run search requests in the background
  8. Cross-cluster replication – Replicate your data across multiple OpenSearch clusters

Why was OpenSearch created by Amazon? #

reason from Amazon

  • In January 21, 2021, Elastic NV announced that they would change their software licensing strategy and not release new versions of Elasticsearch and Kibana under the permissive Apache License, Version 2.0 (ALv2).
  • Instead, Elastic is releasing Elasticsearch and Kibana with source code available under the Elastic License or Server Side Public License (SSPL).
  • These licenses are not open source and do not offer users the same freedoms.
  • Because some developers want their software to be open source and because they want it to avoid single vendor lock-in, we made the decision to create and maintain a fork from the last ALv2 version of Elasticsearch and Kibana. The fork is called OpenSearch and is available under ALv2.

Who sponsors and maintains OpenSearch? #

The OpenSearch project, created by Amazon, is a forked search project based on old versions of Elasticsearch and Kibana. These projects were created primarily to support Amazon OpenSearch Service (formerly Amazon Elasticsearch Service). Amazon OpenSearch Service will not deliver current or future releases of Elasticsearch and Kibana.

Many organizations including AWS, SAP, CapitalOne, RedHat, Logz.io, Aiven.io, Bonsai, Logit.io, InstaCluster, and BAInsight have publicly backed OpenSearch.

2.3 version #

Cannot be used as Vector Store on OCI #

<2023-12-11 Mon>

  1. the currently available version on OCI does not support KNN plugin
  2. it does not support “dense_vetor” mapping type

Available plugins #

opensearch-data-0   analysis-icu                   2.3.0
opensearch-data-0   analysis-kuromoji              2.3.0
opensearch-data-0   analysis-nori                  2.3.0
opensearch-data-0   analysis-phonetic              2.3.0
opensearch-data-0   analysis-smartcn               2.3.0
opensearch-data-0   analysis-stempel               2.3.0
opensearch-data-0   analysis-ukrainian             2.3.0
opensearch-data-0   ingest-attachment              2.3.0
opensearch-data-0   mapper-size                    2.3.0
opensearch-data-0   oci-searchindexing-pack        2.3.0
opensearch-data-0   opensearch-analysis-vietnamese oracle-2.3.0-4
opensearch-data-0   opensearch-index-management    2.3.0.0
opensearch-data-0   opensearch-job-scheduler       2.3.0.0
opensearch-master-0 analysis-icu                   2.3.0
opensearch-master-0 analysis-kuromoji              2.3.0
opensearch-master-0 analysis-nori                  2.3.0
opensearch-master-0 analysis-phonetic              2.3.0
opensearch-master-0 analysis-smartcn               2.3.0
opensearch-master-0 analysis-stempel               2.3.0
opensearch-master-0 analysis-ukrainian             2.3.0
opensearch-master-0 ingest-attachment              2.3.0
opensearch-master-0 mapper-size                    2.3.0
opensearch-master-0 oci-searchindexing-pack        2.3.0
opensearch-master-0 opensearch-analysis-vietnamese oracle-2.3.0-4
opensearch-master-0 opensearch-index-management    2.3.0.0
opensearch-master-0 opensearch-job-scheduler       2.3.0.0

Supported plugins #

ref

Supported languages #

ref Includes Arabic.

2.8 Version #

Available on OCI now, <2023-12-19 Tue> ref

New 2.8 features #

OpenSearch v2.8, hosted on OCI, introduces the following features:

  1. Neural search: An innovative machine learning (ML) feature, significantly elevating the relevance of search results. Neural search is an experimental feature that enables the integration of machine learning models into your search workloads.

  2. Enhanced security: More authentication options, such as LDAP, SAML, and OpenID, provide a robust and safe search environment for enterprises (currently in limited availability)

  3. New plugin support: Use all or some of the Neural Search, ML Commons, and k-nearest neighbors (KNN plugin to enhance the search results for keyword or semantic queries. More plug-ins are on the way.

  4. OCI’s scalability: Unmatched scalability in OCI ensures smooth indexing and searching of millions of documents.

2.11 Version #

OpenSearch version 2.11 is the next version in progress and expected GA is by March 2024, 12-March-2024 confirmed by raghunath.shankar@oracle.com.

OpenSearch 2.11.0 introduces an array of features for Semantic Search applications, new options for durable data storage, and new functionality for security analytics, Observability, and more. Experimental features include new tools for tracing OpenSearch requests and enhancements for conversational search pipelines.

  • Multimodal Semantic Search lets you combine images with text, adding valuable context to support better relevancy for search results.

  • Sparse retrieval (Sparse Vector) is now available for text-based vector search. OpenSearch now offers both sparse and dense retrieval (Dense Vector) methods so you can choose the approach that suits your application requirements.

  • The search comparison tool is now generally available, allowing you to compare the results of two different search ranking techniques side by side so you can identify opportunities to fine-tune your results.

  • Snapshots are now interoperable with remote-backed storage, offering another approach to data durability with the potential to reduce storage resource requirements.

  • Updates to the Security Analytics interface are designed to make it easier to use the security toolkit, introducing a new workflow to simplify creation of threat detectors and alerts as well as the ability to organize log types by category.

  • OpenSearch can now facilitate Authorization at the REST layer, empowering plugin developers to establish secure access controls over endpoints in addition to transport layer authorization.

  • This release removes dependencies on AngularJS, helping modernize and improve the security posture(security posture) of OpenSearch Dashboards. A recent announcement of this update with additional details can be found here

Experimental Features #

  • OpenSearch 2.11.0 includes the following experimental features. Experimental features are disabled by default. For instructions on how to enable them, refer to the documentation for the feature.
  • The ability to track OpenSearch requests with traces is new for 2.11, allowing developers to follow OpenSearch requests and tasks as they traverse components and services across the distributed architecture, monitor the path of requests through the system, measure request latencies, and more.
  • Updates to the conversational search tools introduced as experimental in 2.10 offer several parameters that can be used to customize retrieval augmented generation pipelines, providing core logic that allows you to adapt the way OpenSearch interacts with large language learning models as part of generative AI applications.

Plugins #

GET _cat/plugins

Output

opensearch-master-0 analysis-icu 2.11.0
opensearch-master-0 analysis-kuromoji 2.11.0
opensearch-master-0 analysis-nori 2.11.0
opensearch-master-0 analysis-phonetic 2.11.0
opensearch-master-0 analysis-smartcn 2.11.0
opensearch-master-0 analysis-stempel 2.11.0
opensearch-master-0 analysis-ukrainian 2.11.0
opensearch-master-0 ingest-attachment 2.11.0
opensearch-master-0 mapper-size 2.11.0
opensearch-master-0 oci-searchindexing-pack 2.11.0
opensearch-master-0 opensearch-alerting 2.11.0.0
opensearch-master-0 opensearch-analysis-vietnamese oracle-2.11.0-4
opensearch-master-0 opensearch-anomaly-detection 2.11.0.0
opensearch-master-0 opensearch-index-management 2.11.0.0
opensearch-master-0 opensearch-job-scheduler 2.11.0.0
opensearch-master-0 opensearch-knn 2.11.0.0-SNAPSHOT
opensearch-master-0 opensearch-ml 2.11.0.0
opensearch-master-0 opensearch-neural-search 2.11.0.0
opensearch-master-0 opensearch-notifications 2.11.0.0
opensearch-master-0 opensearch-notifications-core 2.11.0.0
opensearch-master-0 opensearch-reports-scheduler 2.11.0.0
opensearch-master-0 opensearch-sql 2.11.0.0
opensearch-data-0 analysis-icu 2.11.0
opensearch-data-0 analysis-kuromoji 2.11.0
opensearch-data-0 analysis-nori 2.11.0
opensearch-data-0 analysis-phonetic 2.11.0
opensearch-data-0 analysis-smartcn 2.11.0
opensearch-data-0 analysis-stempel 2.11.0
opensearch-data-0 analysis-ukrainian 2.11.0
opensearch-data-0 ingest-attachment 2.11.0
opensearch-data-0 mapper-size 2.11.0
opensearch-data-0 oci-searchindexing-pack 2.11.0
opensearch-data-0 opensearch-alerting 2.11.0.0
opensearch-data-0 opensearch-analysis-vietnamese oracle-2.11.0-4
opensearch-data-0 opensearch-anomaly-detection 2.11.0.0
opensearch-data-0 opensearch-index-management 2.11.0.0
opensearch-data-0 opensearch-job-scheduler 2.11.0.0
opensearch-data-0 opensearch-knn 2.11.0.0-SNAPSHOT
opensearch-data-0 opensearch-ml 2.11.0.0
opensearch-data-0 opensearch-neural-search 2.11.0.0
opensearch-data-0 opensearch-notifications 2.11.0.0
opensearch-data-0 opensearch-notifications-core 2.11.0.0
opensearch-data-0 opensearch-reports-scheduler 2.11.0.0
opensearch-data-0 opensearch-sql 2.11.0.0

ref The OpenSearch Neural Search plugin enables the integration of machine learning (ML) language models into your search workloads. During ingestion and search, the Neural Search plugin transforms text into vectors. Then, Neural Search uses the transformed vectors in vector-based search.

opensearch-ml #

opensearch-knn #

KNN

2.17 Version (from 2.11) #

2.12 #

• Query Insights: Enabled monitoring of top N queries to gain better insights into search patterns. • k-NN Search on Nested Fields: Enhanced k-nearest neighbors search capabilities to support nested fields.

2.13 #

• Vector Quantization: Introduced vector quantization within OpenSearch to improve vector search efficiency. • LLM Guardrails: Added guardrails for large language models to enhance AI safety.

2.14 #

• Semantic Cache for LangChain: Introduced a semantic cache to optimize LangChain applications. Low-Level Vector Query Interface: Provided a new interface for neural sparse queries. • Improved k-NN Search Filtering: Enhanced filtering capabilities for k-NN searches.

2.15 #

Parallel Ingestion Processing: Enabled parallel processing during data ingestion to boost performance. • SIMD Support for Exact Search: Introduced SIMD support to accelerate exact search operations. • Disable Doc Values for k-NN Field: Provided the ability to disable document values for k-NN fields to optimize storage. • Wildcard and Derived Field Types: Added new field types to enhance data modeling flexibility. • Single-Cardinality Aggregations Performance: Improved performance for aggregations with single cardinality. • Rolling Upgrades to Remote-Backed Clusters: Facilitated rolling upgrades for clusters with remote storage. • Enhanced Metrics for Top N Queries: Provided more detailed metrics for monitoring top N queries.

2.16 #

• k-NN Search Enhancements: Improved k-NN search functionalities, especially for nested fields.

2.17 (September 17, 2024): #

• Disk-Optimized Vector Search: Introduced a new disk-optimized vector search feature utilizing binary quantization, achieving up to 32x compression in memory usage and delivering significant cost savings while maintaining high recall rates and low latencies. • Enhanced ML Inference Search Processors: Improved machine learning inference capabilities within search processors to accelerate application development and support generative AI workloads. • Expanded Batch Processing Capabilities: Enhanced batch processing functionalities to handle larger datasets more efficiently. • Advanced Search Optimization: Implemented advanced optimization techniques to improve search performance and accuracy.

Data Types #

2.8 version #

ref

OpenSearch SQL TypeOpenSearch TypeSQL Type
booleanbooleanBOOLEAN
bytebyteTINYINT
shortbyteSMALLINT
integerintegerINTEGER
longlongBIGINT
floatfloatREAL
half_floatfloatFLOAT
scaled_floatfloatDOUBLE
doubledoubleDOUBLE
keywordstringVARCHAR
texttextVARCHAR
datetimestampTIMESTAMP
date_nanostimestampTIMESTAMP
ipipVARCHAR
datetimestampTIMESTAMP
binarybinaryVARBINARY
objectstructSTRUCT
nestedarraySTRUCT

Date and time types #

The date and time types represent a time period: DATE, TIME, DATETIME, TIMESTAMP, and INTERVAL. By default, the OpenSearch DSL uses the date type as the only date-time related type that contains all information of an absolute time point.

To integrate with SQL, each type other than the timestamp type holds part of the time period information. To use date-time functions, see datetime. Some functions might have restrictions for the input argument type.

  • Date

    The date type represents the calendar date regardless of the time zone. A given date value is a 24-hour period, but this period varies in different timezones and might have flexible hours during daylight saving programs. The date type doesn’t contain time information and it only supports a range of 1000-01-01 to 9999-12-31.

  • Time

    The time type represents the time of a clock regardless of its timezone. The time type doesn’t contain date information.

  • Datetime

    The datetime type is a combination of date and time. It doesn’t contain timezone information. For an absolute time point that contains date, time, and timezone information, see Timestamp.

  • Timestamp

    The timestamp type is an absolute instance independent of timezone or convention. For example, for a given point of time, if you change the timestamp to a different timezone, its value changes accordingly.

    The timestamp type is stored differently from the other types. It’s converted from its current timezone to UTC for storage and converted back to its set timezone from UTC when it’s retrieved.

  • Interval

    The interval type represents a temporal duration or a period.

    The expr unit is any expression that eventually iterates to a quantity value. It represents a unit for interpreting the quantity, including MICROSECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, and YEAR. The INTERVAL keyword and the unit specifier are not case sensitive.

    The interval type has two classes of intervals: year-week intervals and day-time intervals.

  • Year-week intervals

    store years, quarters, months, and weeks.

  • Day-time intervals

    store days, hours, minutes, seconds, and microseconds.

Search Types #

youtube


Go to random page

Previous Next