llama-hub

llama-hub

May 29, 2024 | seedling, permanent

tags :

LLM Apps #

A library of data loaders for LLM made by the community – to be used with GPT Index and/or LangChain

Connect custom data sources to your LLM with one or more of these plugins (via LlamaIndex or LangChain)

URL github

Custom data sources #

Unstructured.io File Loader #

URL

This loader extracts the text from a variety of unstructured text files using Unstructured.io. Currently, the file extensions that are supported are

  1. .txt,

  2. .docx,

  3. .pptx,

  4. .jpg,

  5. .png,

  6. .eml,

  7. .html, (HTML) and

  8. .pdf (PDF) documents. A single local file is passed in each time you call load_data.

    from pathlib import Path
    from llama_hub.file.unstructured.base import UnstructuredReader
    
    loader = UnstructuredReader()
    documents = loader.load_data(file=Path('./10k_filing.html'))
    
    from pathlib import Path
    from llama_index import download_loader
    from llama_index import SimpleDirectoryReader
    
    UnstructuredReader = download_loader('UnstructuredReader')
    
    dir_reader = SimpleDirectoryReader('./data', file_extractor=
      ".pdf": UnstructuredReader(),
      ".html": UnstructuredReader(),
      ".eml": UnstructuredReader(),
    )
    documents = dir_reader.load_data()
    

Obsidian Loader #

This loader loads documents from a markdown directory (for instance, an Obsidian vault).

from llama_index import download_loader
import os

ObsidianReader = download_loader('ObsidianReader')
documents = ObsidianReader('/path/to/dir').load_data() # Returns list of documents

Dependencies #

“failed to find libmagic” error: Try pip install python-magic-bin==0.4.14. Solution documented here. On macOS, you may also try brew install libmagic.


Links to this note

Go to random page

Previous Next