LlamaIndex • 2024-11-21

使用 LlamaIndex 和 Memgraph 构建知识图谱

这是来自我们的朋友 Memgraph 的客座文章。

在这篇博文中，我们将分享 Memgraph 如何与 LlamaIndex 集成。您可以使用 LlamaIndex 将原始数据转换为结构化知识图谱，随后通过自然语言进行查询。

这里提供了一个分步指南，包含安装说明、环境设置以及一个从查尔斯·达尔文传记创建的示例知识图谱，助您快速入门。

步骤 1：安装和设置 Memgraph

快速开始使用 Memgraph（Memgraph db + MAGE 库 + Lab）的最快方法是运行以下命令

对于 Linux/macOS

curl https://install.memgraph.com | sh

对于 Windows

iwr https://windows.memgraph.com | iex

安装后，启动 Memgraph Lab，这是一个用于与数据库交互的可视化工具。通过以下方式访问：

Web: http://localhost:3000
桌面应用: 在此处下载。

如果您需要更多详细信息，请查看 Memgraph 入门文档。

步骤 2：安装 LlamaIndex 和 Memgraph 集成

运行以下命令安装 LlamaIndex 和 Memgraph 的图集成包

%pip install llama-index llama-index-graph-stores-memgraph

该软件包将 LlamaIndex 与 Memgraph 集成，使您能够将非结构化数据转换为结构化知识图谱，以便轻松构建、可视化和查询。

步骤 3：配置您的环境

数据库凭据

通过设置以下参数配置 LlamaIndex 以连接到您的 Memgraph 数据库

from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore

username = ""  # Your Memgraph username, default is ""
password = ""  # Your Memgraph password, default is ""
url = "bolt://localhost:7687"  # Connection URL for Memgraph

graph_store = MemgraphPropertyGraphStore(
    username=username,
    password=password,
    url=url,
)

设置 OpenAI API 密钥

将您的 OpenAI API 密钥添加到环境中，用于嵌入和查询处理。

import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"  # Replace with your OpenAI API key

步骤 4：加载和准备您的数据

使用一个关于查尔斯·达尔文的示例文本文件作为您的数据集，存储在 ./data/charles_darwin/charles.txt 中

Charles Robert Darwin was an English naturalist, geologist, and biologist, widely known for his contributions to evolutionary biology. His proposition that all species of life have descended from a common ancestor is now generally accepted and considered a fundamental scientific concept. In a joint publication with Alfred Russel Wallace, he introduced his scientific theory that this branching pattern of evolution resulted from a process he called natural selection, in which the struggle for existence has a similar effect to the artificial selection involved in selective breeding. Darwin has been described as one of the most influential figures in human history and was honoured by burial in Westminster Abbey.

使用 LlamaIndex 的 SimpleDirectoryReader 加载这些非结构化文本数据

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/charles_darwin/").load_data()

数据现已加载到 documents 变量中，并将在后续步骤中用作参数：索引创建和图构建。

步骤 5：构建知识图谱

LlamaIndex 提供了几种图构造器。在本教程中，我们将使用 SchemaLLMPathExtractor 自动从文本中提取实体和关系。

构建图

from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=OpenAI(model="gpt-4", temperature=0.0),
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

此步骤通过识别查尔斯·达尔文数据集中的关键概念及其关系，在 Memgraph 中创建一个知识图谱。该图谱现在可以进行查询！

在下图中，您可以看到文本如何被转换为知识图谱并存储在 Memgraph 中。

步骤 6：查询知识图谱

构建知识图谱后，查询变得简单直观。LlamaIndex 提供了从图中检索节点和路径的多种方法。如果未配置特定的检索器，系统将默认使用 LLMSynonymRetriever。

自然语言查询为何重要

使用自然语言，您可以提出通常需要复杂查询语言才能解决的问题。在这里，模型从图中获取相关信息，并利用图谱构建过程中捕获的连接和实体，以人类可读的格式返回结果。

查询示例

query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("Who did Charles Robert Darwin collaborate with?")
print(str(response))

查询：“查尔斯·罗伯特·达尔文与谁合作过？”回应：系统将阿尔弗莱德·罗素·华莱士识别为合作者。

这使得即使非技术用户也能轻松地使用自然语言提取洞察。