LlamaIndex • 2024-05-29

引入属性图索引：使用 LLM 构建知识图谱的强大新方法

知识图谱

我们很高兴地宣布 LlamaIndex 的一项新功能，它扩展了我们的知识图谱功能，使其更加灵活、可扩展和健壮。隆重推出属性图索引！

为何选择属性图？

传统的知识图谱表示方法，如知识三元组（主语、谓语、宾语），在表达能力上受到限制。它们缺乏以下能力：

为节点和关系分配标签和属性
将文本节点表示为向量嵌入
执行向量和符号检索

我们现有的 KnowledgeGraphIndex 受这些限制以及索引架构本身的普遍限制所困扰。

属性图索引解决了这些问题。通过使用带标签的属性图表示，它使得对知识图谱进行更丰富的建模、存储和查询成为可能。

使用属性图，您可以：

将节点和关系分类到具有相关元数据的类型中
将您的图视为向量数据库的超集，用于混合搜索
使用 Cypher 图查询语言表达复杂查询

这使得属性图成为使用 LLM 构建知识图谱的强大且灵活的选择。

构建您的图

属性图索引提供了几种从数据中提取知识图谱的方法，您可以随意组合使用：

1. 模式引导提取：在模式中定义允许的实体类型、关系类型及其连接。LLM 只会提取符合此模式的图数据。

from llama_index.indices.property_graph import SchemaLLMPathExtractor

entities = Literal["PERSON", "PLACE", "THING"]
relations = Literal["PART_OF", "HAS", "IS_A"]
schema = {
    "PERSON": ["PART_OF", "HAS", "IS_A"],
    "PLACE": ["PART_OF", "HAS"], 
    "THING": ["IS_A"],
}

kg_extractor = SchemaLLMPathExtractor(
  llm=llm, 
  possible_entities=entities, 
  possible_relations=relations, 
  kg_validation_schema=schema,
  strict=True,  # if false, allows values outside of spec
)

2. 隐式提取：使用 LlamaIndex 的构造来指定数据中节点之间的关系。图将基于 node.relationships 属性构建。例如，当通过节点解析器处理文档时，PREVIOUS、NEXT 和 SOURCE 关系将被捕获。

from llama_index.core.indices.property_graph import ImplicitPathExtractor

kg_extractor = ImplicitPathExtractor()

3. 自由形式提取：让 LLM 以自由形式直接从您的数据中推断实体、关系类型和模式。（这类似于当前的 KnowledgeGraphIndex 的工作方式。）

from llama_index.core.indices.property_graph import SimpleLLMPathExtractor

kg_extractor = SimpleLLMPathExtractor(llm=llm)

混合和匹配这些提取方法，以对您的图结构进行精细控制。

from llama_index.core import PropertyGraphIndex

index = PropertyGraphIndex.from_documents(docs, kg_extractors=[...])

嵌入

默认情况下，所有图节点都会被嵌入。虽然某些图数据库原生支持嵌入，您也可以在您的图数据库之上指定和使用 LlamaIndex 中的任何向量存储。

index = PropertyGraphIndex(..., vector_store=vector_store)

查询您的图

属性图索引支持多种查询技术，可以组合并同时运行。

1. 基于关键词/同义词的检索：将您的查询扩展为相关的关键词和同义词，并查找匹配的节点。

from llama_index.core.indices.property_graph import LLMSynonymRetriever

sub_retriever = LLMSynonymRetriever(index.property_graph_store, llm=llm)

2. 向量相似度：根据节点向量表示与您查询的相似度来检索节点。

from llama_index.core.indices.property_graph import VectorContextRetriever

sub_retriever = VectorContextRetriever(
  index.property_graph_store, 
  vector_store=index.vector_store,
  embed_model=embed_model,
)

3. Cypher 查询：使用富有表达力的 Cypher 图查询语言指定复杂的图模式并遍历多个关系。

from llama_index.core.indices.property_graph import CypherTemplateRetriever
from llama_index.core.bridge.pydantic import BaseModel, Field

class Params(BaseModel):
 “””Parameters for a cypher query.”””
 names: list[str] = Field(description=”A list of possible entity names or keywords related to the query.”)
 
cypher_query = """
   MATCH (c:Chunk)-[:MENTIONS]->(o) 
   WHERE o.name IN $names
   RETURN c.text, o.name, o.label;
"""
   
sub_retriever = CypherTemplateRetriever(
 index.property_graph_store, 
 Params, 
 cypher_query,
 llm=llm,
)

除了提供模板外，您还可以让 LLM 根据查询和数据库的上下文编写完整的 Cypher 查询。

from llama_index.core.indices.property_graph import TextToCypherRetriever

sub_retriever = TextToCypherRetriever(index.property_graph_store, llm=llm)

4. 自定义图遍历：通过对关键检索器组件进行子类化来定义您自己的图遍历逻辑。

这些检索器可以组合和构成，用于利用图结构和节点向量表示的混合搜索。

from llama_index.indices.property_graph import VectorContextRetriever, LLMSynonymRetriever

vector_retriever = VectorContextRetriever(index.property_graph_store, embed_model=embed_model)  
synonym_retriever = LLMSynonymRetriever(index.property_graph_store, llm=llm)

retriever = index.as_retriever(sub_retrievers=[vector_retriever, synonym_retriever])

使用属性图存储

在底层，属性图索引使用 PropertyGraphStore 抽象来存储和检索图数据。您也可以直接使用此存储以进行更底层的控制。

此存储支持：

插入和更新节点、关系和属性
按 ID 或属性查询节点
从起始节点检索关系路径
执行 Cypher 查询（如果后端存储支持）

from llama_index.graph_stores.neo4j import Neo4jPGStore

graph_store = Neo4jPGStore(
    username="neo4j",
    password="password",
    url="bolt://localhost:7687",
)

# insert nodes
nodes = [
    EntityNode(name="llama", label="ANIMAL", properties={"key": "value"}),
    EntityNode(name="index", label="THING", properties={"key": "value"}), 
]
graph_store.upsert_nodes(nodes)

# insert relationships  
relations = [
    Relation(
        label="HAS",
        source_id=nodes[0].id, 
        target_id=nodes[1].id,
    )
]
graph_store.upsert_relations(relations)

# query nodes
llama_node = graph_store.get(properties={"name": "llama"})[0]

# get relationship paths  
paths = graph_store.get_rel_map([llama_node], depth=1)

# run Cypher query
results = graph_store.structured_query("MATCH (n) RETURN n LIMIT 10")