关于LlamaIndex 的存储概念和代码基本实现

概念

LlamaIndex 提供了一个高级接口，用于提取、索引和查询外部数据。

在后台，LlamaIndex 还支持可插拔的存储组件，允许您自定义：

Document stores 文档存储：存储摄取的文档（即对象）的位置，Node
Index stores 索引存储：存储索引元数据的位置，
Vector stores 向量存储：嵌入向量的存储位置。
Property Graph stores 属性图存储：知识图谱的存储位置（即 for ）。PropertyGraphIndex
Chat Stores 聊天记录存储：存储和组织聊天消息的地方。

文档/索引存储依赖于一个通用的键-值存储抽象，下面也将详细介绍。

LlamaIndex支持将数据持久化到 fsspec(Python的文件系统规范库) 支持的任何存储后端。我们已确认支持以下存储后端：

本地文件系统
AWS S3 (云存储)
Cloudflare R2(云存储)

在这里插入图片描述

使用模式

许多向量存储（FAISS向量存储数据库除外）将同时存储数据和索引（嵌入）。这意味着您不需要使用单独的文档存储或索引存储。这也意味着您不需要显式保留此数据 - 这会自动发生。构建新索引/重新加载现有索引的用法将如下所示。

## build a new index
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.deeplake import DeepLakeVectorStore# construct vector store and customize storage context
vector_store = DeepLakeVectorStore(dataset_path="<dataset_path>")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Load documents and build index
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context
)## reload an existing one
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

有关更多详细信息，请参阅下面的 Vector Store 模块指南。

请注意，通常要使用存储抽象，您需要定义一个对象：StorageContext 存储上下文

from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore
from llama_index.core.vector_stores import SimpleVectorStore
from llama_index.core import StorageContext# create storage context using default stores
storage_context = StorageContext.from_defaults(docstore=SimpleDocumentStore(),vector_store=SimpleVectorStore(),index_store=SimpleIndexStore(),
)

有关自定义/持久性的更多详细信息，请参阅下面的指南。

自定义存储

保存/加载

持久保存数据
默认情况下，LlamaIndex 将数据存储在内存中，如果需要，可以显式持久保存此数据：
```
storage_context.persist(persist_dir="<persist_dir>")
```
这会在指定（路径或默认persist_dir./storage）下将数据持久保存到磁盘。
可以保存多个索引并从同一目录加载，前提是您跟踪要加载的索引 ID。
用户还可以配置默认持久化数据的替代存储后端（例如MongoDBstorage_context.persist() ）。在这种情况下，调用不会执行任何操作。

加载数据
要加载数据，用户只需使用相同的配置重新创建存储上下文（例如使用·persist_dir·，传入 same 或 vector store 客户端）。

  storage_context = StorageContext.from_defaults(docstore=SimpleDocumentStore.from_persist_dir(persist_dir="<persist_dir>"),vector_store=SimpleVectorStore.from_persist_dir(persist_dir="<persist_dir>"),index_store=SimpleIndexStore.from_persist_dir(persist_dir="<persist_dir>"),
)

然后，我们可以通过下面的一些便捷函数从StorageContext中加载特定的索引。

from llama_index.core import (load_index_from_storage,load_indices_from_storage,load_graph_from_storage,
)# load a single index
# need to specify index_id if multiple indexes are persisted to the same directory
index = load_index_from_storage(storage_context, index_id="<index_id>")# don't need to specify index_id if there's only one index in storage context
index = load_index_from_storage(storage_context)# load multiple indices
indices = load_indices_from_storage(storage_context)  # loads all indices
indices = load_indices_from_storage(storage_context, index_ids=[index_id1, ...]
)  # loads specific indices# load composable graph
graph = load_graph_from_storage(storage_context, root_id="<root_id>"
)  # loads graph with the specified root_id