llama-hub

May 29, 2024 | seedling, permanent

tags :

LLM Apps #

A library of data loaders for LLM made by the community – to be used with GPT Index and/or LangChain

Connect custom data sources to your LLM with one or more of these plugins (via LlamaIndex or LangChain)

URL github

Custom data sources #

Unstructured.io File Loader #

URL

This loader extracts the text from a variety of unstructured text files using Unstructured.io. Currently, the file extensions that are supported are

.txt,
.docx,
.pptx,
.jpg,
.png,
.eml,
.html, (HTML) and

.pdf (PDF) documents. A single local file is passed in each time you call load_data.

from pathlib import Path
from llama_hub.file.unstructured.base import UnstructuredReader

loader = UnstructuredReader()
documents = loader.load_data(file=Path('./10k_filing.html'))

from pathlib import Path
from llama_index import download_loader
from llama_index import SimpleDirectoryReader

UnstructuredReader = download_loader('UnstructuredReader')

dir_reader = SimpleDirectoryReader('./data', file_extractor=
  ".pdf": UnstructuredReader(),
  ".html": UnstructuredReader(),
  ".eml": UnstructuredReader(),
)
documents = dir_reader.load_data()

Obsidian Loader #

This loader loads documents from a markdown directory (for instance, an Obsidian vault).

from llama_index import download_loader
import os

ObsidianReader = download_loader('ObsidianReader')
documents = ObsidianReader('/path/to/dir').load_data() # Returns list of documents

Dependencies #

“failed to find libmagic” error: Try pip install python-magic-bin==0.4.14. Solution documented here. On macOS, you may also try brew install libmagic.

Links to this note

LlamaIndex