Self-Quering Retriever

Self-Quering Retriever

May 18, 2024 | seedling, permanent

tags :

Self-Quering Retriever in #

URL A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters. ref

OpenSearch #

opensearch_self_query

docs = [
    Document(
        page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
        metadata="year": 1993, "rating": 7.7, "genre": "science fiction",
    ),
    Document(
        page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
        metadata="year": 2010, "director": "Christopher Nolan", "rating": 8.2,
    ),
    Document(
        page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
        metadata="year": 2006, "director": "Satoshi Kon", "rating": 8.6,
    ),
    Document(
        page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
        metadata="year": 2019, "director": "Greta Gerwig", "rating": 8.3,
    ),
    Document(
        page_content="Toys come alive and have a blast doing so",
        metadata="year": 1995, "genre": "animated",
    ),
    Document(
        page_content="Three men walk into the Zone, three men walk out of the Zone",
        metadata=
            "year": 1979,
            "rating": 9.9,
            "director": "Andrei Tarkovsky",
            "genre": "science fiction",
        ,
    ),
]
vectorstore = OpenSearchVectorSearch.from_documents(
    docs,
    embeddings,
    index_name="opensearch-self-query-demo",
    opensearch_url="http://localhost:9200",
)


from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import OpenAI

metadata_field_info = [
    AttributeInfo(
        name="genre",
        description="The genre of the movie",
        type="string or list[string]",
    ),
    AttributeInfo(
        name="year",
        description="The year the movie was released",
        type="integer",
    ),
    AttributeInfo(
        name="director",
        description="The name of the movie director",
        type="string",
    ),
    AttributeInfo(
        name="rating", description="A 1-10 rating for the movie", type="float"
    ),
]
document_content_description = "Brief summary of a movie"
llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)

OCR of Images #

2024-03-12_21-37-19_screenshot.png #

Self-querying Query constructor Vector store "What did bar say about foo" Query: "foo" search: "foo" Query translator Filter: eq("author", "bar") where: ("author": "bar")


Links to this note

Go to random page

Previous Next