vanna.ai

May 25, 2024 | seedling, permanent

tags :

Python Apps AI #

URL

AI SQL Accuracy: Testing different LLMs + context strategies to maximize SQL generation accuracy, ref

Using prior SQL queries that worked to improve the results

Why Vanna.AI? #

Supported Databases #

PostgreSQL

MySQL

SQlite

Oracle

Supported LLMs
- OpenAI
- Azure OpenAI
- Anthropic
- Ollama
- google Gemini
- Mistral

Supported Vector Stores or Metadata Store

Vanna hosted vectordb

ChromaDB

Qdrant

Marqo

Other Vector Stores

ref

%pip install 'vanna[openai,mysql]'
from vanna.openai import OpenAI_Chat
from vanna.base import VannaBase
class MyCustomVectorDB(VannaBase):
  def add_ddl(self, ddl: str, **kwargs) -> str:
     # Implement here

  def add_documentation(self, doc: str, **kwargs) -> str:
     # Implement here

  def add_question_sql(self, question: str, sql: str, **kwargs) -> str:
     # Implement here

  def get_related_ddl(self, question: str, **kwargs) -> list:
     # Implement here

  def get_related_documentation(self, question: str, **kwargs) -> list:
     # Implement here

  def get_similar_question_sql(self, question: str, **kwargs) -> list:
     # Implement here

  def get_training_data(self, **kwargs) -> pd.DataFrame:
     # Implement here

  def remove_training_data(id: str, **kwargs) -> bool:
     # Implement here


class MyVanna(MyCustomVectorDB, OpenAI_Chat):
    def __init__(self, config=None):
        MyCustomVectorDB.__init__(self, config=config)
        OpenAI_Chat.__init__(self, config=config)

vn = MyVanna(config='api_key': 'sk-...', 'model': 'gpt-4-...')

TODO try implementing this with OpenSearch

OCR of Images #

2024-05-12_13-43-20_screenshot.png #

How accurately can LLMS generate SQL? Using contextually-relevant SQL has a dramatic LLM Bison GPT 3.5 GPT 4 8 e DE 100% 90% 80%- I A 70% L - a a - 60% U 50% a E 8 40% a 3 30% of 20% e 10% Adding a handful of improvement in accuracy example queries improved results 91% 88% GPT-4 worked best overall for this dataset 74% 69% 61% 57% 42% 43% Providing just the schema doesn't work well 34% 10% 0% 0% Schema 0% Static Contextual Avg Context Strategy

2024-05-12_13-44-35_screenshot.png #

r - - - - SCHEMA RELEVANCE PRIOR SQL 99 CORRECT ?1?) 00 QUESTION CREATE PROMPT GENERATE SQL RUN SQL Sending relevant context to the LLM for SQL generation INCORRECT

2024-05-12_13-45-39_screenshot.png #

Why Vanna.Al? D Open-Source High accuracy on complex datasets Vanna's capabilities are tied to the training data you give it. More training data means better accuracy for large and complex datasets. Designed for security The Vanna Python package and the various frontend integrations are all open-source. You can run Vanna on your own infrastructure. Your database contents are never sent to the LLM unless you specifically enable features that require it. The metadata storage layer only sees schemas, documentation, and queries. Self learning Supports many databases Choose your front end As you use Vanna more, your model continuously improves as' we augment your training data. We have out-of-the-box support Snowflake BigQuery, Postgres, and many others. You can easily make a connector for any database. Start in a Jupyter Notebook. Expose to business users via Slackbot, web app, Streamlit app, any other frontend. Even integrate in your web app for customers.

vanna.ai

Python Apps AI #

Why Vanna.AI? #

Supported Databases #

OCR of Images #

2024-05-12_13-43-20_screenshot.png #

2024-05-12_13-44-35_screenshot.png #

2024-05-12_13-45-39_screenshot.png #

No notes link to this note