Logan Markewich • 2023-06-08

LlamaIndex 与 Transformers Agents

摘要

代理是大型语言模型 (LLM) 的流行用例，通常提供一种结构，使 LLM 能够做出决策、使用工具并完成任务。这些代理可以有多种形式，例如 Auto-GPT 等完全自主的版本，以及 Langchain Agents 等更受控的实现。随着近期 Transformers Agents 的发布，我们展示了 LlamaIndex 如何通过增强其现有的图像生成工具，继续成为代理的有用工具。使用从 10K DiffusionDB 提示创建的向量索引，我们创建的 Text2Image Prompt Assistant 工具可以重写提示，生成更精美的图像。完整的源代码可在该工具的 Hugging Face Space 中找到，colab notebook 可作为使用指南。

创建工具

Transformers Agents 预装了多种预配置的工具，这些工具利用了 Hugging Face-Hub 上托管的大量开源模型。此外，只需发布一个新的 Hugging Face Space 并进行适当的工具设置，即可创建和分享更多工具。

要创建一个工具，您的代码只需要一个描述工具的 tool_config.json 文件，以及一个包含工具实现的源文件。尽管这部分的文档有些模糊，我们最终得以利用现有自定义工具的实现作为我们自己工具的框架。

为了让 LlamaIndex 能够编写文生图提示，我们需要一种方法向 LLM 展示好的提示示例是什么样的。为此，我们索引了来自 DiffusionDB 的 10K 个随机文生图提示。

from datasets import load_dataset
from llama_index import VectorStoreIndex, Document

# downloads a LOT of data
dataset = load_dataset('poloclub/diffusiondb', '2m_random_10k')

documents = []
for sample in dataset['train']:
    documents.append(Document(sample['prompt']))

# create index
index = VectorStoreIndex.from_documents(documents)

# store index
index.storage_context.persist(persist_dir="./storage")

为了让 LlamaIndex 利用示例来编写提示，我们需要稍微定制一下提示模板。您可以在下方查看最终的提示模板及其使用方法

text_qa_template = Prompt(
    "Examples of text-to-image prompts are below: \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given the existing examples of text-to-image prompts, "
    "write a new text-to-image prompt in the style of the examples, "
    "by re-wording the following prompt to match the style of the above examples: {query_str}\n"
)


refine_template = Prompt(
    "The initial prompt is as follows: {query_str}\n"
    "We have provided an existing text-to-image prompt based on this query: {existing_answer}\n"
    "We have the opportunity to refine the existing prompt "
    "(only if needed) with some more relevant examples of text-to-image prompts below.\n"
    "------------\n"
    "{context_msg}\n"
    "------------\n"
    "Given the new examples of text-to-image prompts, refine the existing text-to-image prompt to better "
    "statisfy the required style. "
    "If the context isn't useful, or the existing prompt is good enough, return the existing prompt."
)

query_engine = index.as_query_engine(
    text_qa_template=text_qa_template, 
    refine_template=refine_template
)

response = query_engine.query("Draw me a picture of a happy dog")

难点 #1

Transformers Agents 目前的一个主要缺点是它们只会选择一个工具来解决每个提示。因此，如果我们想增强图像生成工具，就需要替换它！在我们的工具实现中，我们实际上加载了原始的图像生成工具，并在运行 LlamaIndex 生成新的文生图提示后调用它。

难点 #2

我们旅程中的下一个障碍是 Hugging Face 如何从空间下载工具。最初，它只下载 tool_config.json 文件和工具的源代码。但我们还需要下载我们花时间索引的提示！

为了解决这个问题，在工具的 setup() 过程中，我们调用 hf_hub_download() 下载加载索引所需的文件。

回到正轨

索引创建完成，通用流程也已确定，实际的工具实现就相当直接了。

class Text2ImagePromptAssistant(Tool):
    
    inputs = ['text']
    outputs = ['image']
    description = PROMPT_ASSISTANT_DESCRIPTION
    
    def __init__(self, *args, openai_api_key='', model_name='text-davinci-003', temperature=0.3, verbose=False, **hub_kwargs):
        super().__init__()
        os.environ['OPENAI_API_KEY'] = openai_api_key
        if model_name == 'text-davinci-003':
            llm = OpenAI(model_name=model_name, temperature=temperature)
        elif model_name in ('gpt-3.5-turbo', 'gpt-4'):
            llm = ChatOpenAI(model_name=model_name, temperature=temperature)
        else:
            raise ValueError(
                f"{model_name} is not supported, please choose one "
                "of 'text-davinci-003', 'gpt-3.5-turbo', or 'gpt-4'."
            )
        service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm))
        set_global_service_context(service_context)
        
        self.storage_path = os.path.dirname(__file__)
        self.verbose = verbose
        self.hub_kwargs = hub_kwargs

    def setup(self):
        hf_hub_download(repo_id="llamaindex/text2image_prompt_assistant", filename="storage/vector_store.json", repo_type="space", local_dir=self.storage_path)
        hf_hub_download(repo_id="llamaindex/text2image_prompt_assistant", filename="storage/index_store.json", repo_type="space", local_dir=self.storage_path)
        hf_hub_download(repo_id="llamaindex/text2image_prompt_assistant", filename="storage/docstore.json", repo_type="space", local_dir=self.storage_path)
        
        self.index = load_index_from_storage(StorageContext.from_defaults(persist_dir=os.path.join(self.storage_path, "storage")))
        self.query_engine = self.index.as_query_engine(similarity_top_k=5, text_qa_template=text_qa_template, refine_template=refine_template)
        
        # setup the text-to-image tool too
        self.text2image = load_tool('huggingface-tools/text-to-image')
        self.text2image.setup()

        self.initialized = True

    def __call__(self, prompt):
        if not self.is_initialized:
            self.setup()

        better_prompt = str(self.query_engine.query(prompt)).strip()
        
        if self.verbose:
            print('==New prompt generated by LlamaIndex==', flush=True)
            print(better_prompt, '\n', flush=True)

        return self.text2image(better_prompt)

运行工具

工具设置完成后，我们现在可以使用一个实际的代理来测试它了！为了测试，我们使用了带有 text-davinci-003 模型的 OpenAIAgent。当要求绘制一张山的图片时，我们得到了以下结果

from transformers import OpenAiAgent
agent = OpenAiAgent(model="text-davinci-003", api_key="your_api_key")

agent.run("Draw me a picture a mountain.")

如您所见，图片看起来还不错。但是，文生图提示某种程度上是一门艺术。

要使用我们的新工具，我们只需要替换现有的图像生成工具

from transformers import load_tool
prompt_assistant = load_tool(
    "llamaindex/text2image_prompt_assistant",
    openai_api_key="your_api_key",
    model_name='text-davinci-003',
    temperature=0.3,  # increase or decrease this to control variation
    verbose=True
)

from transformers import OpenAiAgent
agent = OpenAiAgent(model="text-davinci-003", api_key="your_api_key")

# replace the existing tool
agent.toolbox['image_generator'] = prompt_assistant

agent.run("Draw me a picture a mountain.")