LangChain

LangChain

December 7, 2024 | seedling, permanent

tags :

Python Apps, Framework, LLM Apps #

github Building applications with LLM (ChatGPT) through composability

Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. However, using these LLMs in isolation is often insufficient for creating a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.

JAK observation #

guiding the AI to get the job done, with chains and agents.

6 main areas #

There are six main areas that LangChain is designed to help with. These are, in increasing order of complexity:

LLMs and Prompts: #

This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.

Chains #

Chains go beyond a single LLM call and involve sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.

Data Augmented Generation #

Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step.

Examples #

  • Summarization of long pieces of text and
  • question/answering over specific data sources.

Agents #

Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.

Creating a custom agent #

Memory #

Memory refers to persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.

Evaluation #

[BETA] Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.

Data Connection #

Querying a custom datasource #

ref

Ingesting or Embedding #

ref

Querying with LLM #

Projects using LangChain with LLM #

privateGPT #

langchain-falcon-chainlit #

https://github.com/sudarshan-koirala/langchain-falcon-chainlit Simple Chat UI using Falcon model, LangChain and Chainlit

Vector Database #

Embedding #

github ref

Using sentence-transformers for embedding #

ref

# !pip install sentence_transformers > /dev/null
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
# Equivalent to SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
text = "This is a test document."
query_result = embeddings.embed_query(text)
doc_result = embeddings.embed_documents([text, "This is not a test document."])

## or
from langchain.embeddings import SentenceTransformerEmbeddings
embeddings = SentenceTransformerEmbeddings(model="all-MiniLM-L6-v2")

LlamaIndex vs LangChain #

ref ref-2

  • If the goal is mainly an intelligent search tool llamaindex is great, if you want to build a chatgpt clone capable of creating plugins that is a whole different thing.
  • Langchain allows you to leverage multiple instance of ChatGPT, provide them with memory, even multiple instance of llamaindex.
  • Things you can do with langchain is build agents, that can do more than one things, one example is execute Python code, while also searching google.
  • Basically llmaindex is a smart storage mechanism, while Langchain is a tool to bring multiple tools together.
  • LlamaIndex focuses on efficient indexing and retrieval, while LangChain offers a more general purpose Framework
  • Queryring data before getting to the prompt, LlamaIndex is better

Document #

ref

Pass page_content in as positional or named arg.

example

from langchain.schema import BaseRetriever, Document

document = Document(
    page_content="Hello, world!",
    metadata="source": "https://example.com",
    id="some-id" # optional, unique identitifer
)

param id: Optional[str] = None #

An optional identifier for the document.

Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.

New in version 0.2.11.

param metadata: dict [Optional] #

Arbitrary metadata associated with the content.

param page_content: str [Required] #

String text.

param type: Literal[‘Document’] = ‘Document’ #


Go to random page

Previous Next