llama-hub
tags :
LLM Apps #
A library of data loaders for LLM made by the community – to be used with GPT Index and/or LangChain
Connect custom data sources to your LLM with one or more of these plugins (via LlamaIndex or LangChain)
Custom data sources #
Unstructured.io File Loader #
This loader extracts the text from a variety of unstructured text files using Unstructured.io. Currently, the file extensions that are supported are
.txt,
.docx,
.pptx,
.jpg,
.png,
.eml,
.html, (HTML) and
.pdf (PDF) documents. A single local file is passed in each time you call load_data.
from pathlib import Path from llama_hub.file.unstructured.base import UnstructuredReader loader = UnstructuredReader() documents = loader.load_data(file=Path('./10k_filing.html'))from pathlib import Path from llama_index import download_loader from llama_index import SimpleDirectoryReader UnstructuredReader = download_loader('UnstructuredReader') dir_reader = SimpleDirectoryReader('./data', file_extractor= ".pdf": UnstructuredReader(), ".html": UnstructuredReader(), ".eml": UnstructuredReader(), ) documents = dir_reader.load_data()
Obsidian Loader #
This loader loads documents from a markdown directory (for instance, an Obsidian vault).
from llama_index import download_loader
import os
ObsidianReader = download_loader('ObsidianReader')
documents = ObsidianReader('/path/to/dir').load_data() # Returns list of documents
Dependencies #
“failed to find libmagic” error:
Try pip install python-magic-bin==0.4.14.
Solution documented here. On macOS, you may also try
brew install libmagic.