Adam Hofmann • 2023-07-17

为 LLM 代理构建更好的工具

过去一个月里，我深入研究了大型语言模型（LLM）代理的世界，并构建了 LlamaIndex 的代理工具库。上周，作为更广泛的 Data Agents 发布的一部分，我协助领导了 LlamaHub Tools 的工作。

在构建 LlamaHub Tools 的过程中，我收集了一些创建高效易用工具的技术，并想分享一些我的想法。

LlamaHub Tools 背景介绍

LlamaHub Tools 允许像 ChatGPT 这样的 LLM 连接到 API，代表用户执行创建、读取、更新和删除数据等操作。我们整理的一些工具示例包括起草和发送电子邮件、读取和创建 Google Calendar 邀请、搜索维基百科等等，这只是我们发布时推出的 15 个工具中的一部分。

工具抽象概述

LlamaHub Tools 到底是如何工作的呢？LlamaHub 工具抽象允许您轻松编写可由代理理解和调用的 Python 函数。例如，与其尝试让代理执行复杂的数学计算，不如向代理提供一个调用 Wolfram Alpha 并将结果提供给代理的工具。

from llama_index.tools.base import BaseToolSpec

QUERY_URL_TMPL = "http://api.wolframalpha.com/v1/result?appid={app_id}&amp;i={query}"

# Inherit from the LlamaIndex BaseToolSpec abstraction
class WolframAlphaToolSpec(BaseToolSpec):

  # Define the functions that we export to the LLM
    spec_functions = ["wolfram_alpha_query"]

  # Initialize with our wolfram alpha API key
    def __init__(self, app_id: Optional[str] = None) -&gt; None:
        """Initialize with parameters."""
        self.token = app_id
  
  # Our function to be called by the Agent
  def wolfram_alpha_query(self, query: str):
          """
          Make a query to wolfram alpha about a mathematical or scientific problem.
  
          Example inputs:
              "(7 * 12 ^ 10) / 321"
              "How many calories are there in a pound of strawberries"
  
          Args:
              query (str): The query to be passed to wolfram alpha.
  
          """
          response = requests.get(QUERY_URL_TMPL.format(app_id=self.token, query=urllib.parse.quote_plus(query)))
          return response.text

上面的代码足以定义一个允许代理查询 Wolfram Alpha 的 LlamaIndex Tool。再也不会有数学问题的错误猜测了！我们可以像这样初始化 Tool Spec 的实例：

# Initialize an instance of the Tool
wolfram_spec = WolframAlphaToolSpec(app_id="your-key")
# Convert the Tool Spec to a list of tools. In this case we just have one tool.
tools = wolfram_spec.to_tool_list()
# Convert the tool to an OpenAI function and inspect
print(tools[0].metadata.to_openai_function())

以下是打印语句的清理后输出：

{
  'description': '
    Make a query to wolfram alpha about a mathematical or scientific problem.
  
          Example inputs:
              "(7 * 12 ^ 10) / 321"
              "How many calories are there in a pound of strawberries"
  
          Args:
              query (str): The query to be passed to wolfram alpha.',
  'name': 'wolfram_alpha_query',
  'parameters': {
    'properties': {'query': {'title': 'Query', 'type': 'string'}},
    'title': 'wolfram_alpha_query',
    'type': 'object'
  }
}

我们可以看到，描述如何使用工具的 docstring 被传递给了代理。此外，参数、类型信息和函数名也被传递，以便代理能够清楚地了解如何使用此函数。所有这些信息实质上都充当了代理理解工具的提示。

继承 BaseToolSpec 类意味着编写供代理使用的工具非常简单。实际上，上面的工具定义只有 9 行代码，不包括空白、导入和注释。我们可以轻松地让函数准备好供代理使用，而无需任何繁重的样板代码或修改。让我们看看如何将工具加载到 OpenAI 代理中：

agent = OpenAIAgent.from_tools(tools, verbose=True)
agent.chat('What is (7 * 12 ^ 10) / 321')
""" OUTPUT:
=== Calling Function ===
Calling function: wolfram_alpha_query with args: {
  "query": "(7 * 12 ^ 10) / 14"
}
Got output: 30958682112
========================
Response(response='The result of the expression (7 * 12 ^ 10) / 14 is 30,958,682,112.', source_nodes=[], metadata=None)
"""

然后我们可以测试一下，在没有工具的情况下将这个查询传递给 ChatGPT：

&gt; 'What is (7 * 12 ^ 10) / 321'
"""
To calculate the expression (7 * 12^10) / 14, you need to follow the order of operations, which is parentheses, exponents, multiplication, and division (from left to right).

Step 1: Calculate the exponent 12^10.
12^10 = 619,173,642,24.

Step 2: Multiply 7 by the result from Step 1.
7 * 619,173,642,24 = 4,333,215,496,68.

Step 3: Divide the result from Step 2 by 14.
4,333,215,496,68 / 14 = 309,515,392,62.

Therefore, the result of the expression (7 * 12^10) / 14 is 309,515,392,62.
"""

这个示例应该展示了您可以多么轻松地编写用于代理的新工具。在接下来的博客文章中，我将讨论我发现的编写功能更强、更有效工具的技巧和窍门。希望读完这篇博客文章后，您能兴奋地开始编写和贡献您自己的工具！

构建更好工具的技术

以下是一些编写更可用、功能更强的工具的策略，以最大程度地减少与代理交互时的摩擦。并非所有策略都适用于每种工具，但通常下面至少有几种技术会证明很有价值。

编写有用的工具提示

以下是一个函数签名和 docstring 的示例，代理可以调用该工具来创建电子邮件草稿。

def create_draft(
        self,
        to: List[str],
        subject: str,
        message: str
    ) -&gt; str:
        """Create and insert a draft email.
           Print the returned draft's message and id.
           Returns: Draft object, including draft id and message meta data.

        Args:
            to (List[str]): The email addresses to send the message to, eg ['adam@example.com']
            subject (str): The subject for the event
            message (str): The message for the event
        """

这个提示利用了一些不同的模式来确保代理能够有效地使用该工具：

给出函数及其用途的简明描述
告知代理此函数将返回什么数据
列出函数接受的参数，包括描述和类型信息
为具有特定格式的参数提供示例值，例如 adam@example.com

工具提示应该简洁，以免占用过多上下文长度，但也要足够信息丰富，以便代理能够正确使用工具而不出错。

使工具能够容忍部分输入

帮助代理减少错误的一种方法是编写对其输入更具容忍性的工具，例如，当值可以从其他地方推断出来时，将输入设为可选。以起草电子邮件为例，但这次我们考虑一个更新电子邮件草稿的工具：

def update_draft(
        self,
        draft_id: str,
        to: Optional[List[str]] = None,
        subject: Optional[str] = None,
        message: Optional[str] = None,
    ) -&gt; str:
        """Update a draft email.
           Print the returned draft's message and id.
           This function is required to be passed a draft_id that is obtained when creating messages
           Returns: Draft object, including draft id and message meta data.

        Args:
            draft_id (str): the id of the draft to be updated
            to (Optional[str]): The email addresses to send the message to
            subject (Optional[str]): The subject for the event
            message (Optional[str]): The message for the event
        """

Gmail API 在更新草稿时要求上述所有值，但是，仅使用 draft_id，我们可以获取草稿的当前内容，并在代理更新草稿时未提供值的情况下使用现有值作为默认值。

def update_draft(...):
  ...
  draft = self.get_draft(draft_id)
  headers = draft['message']['payload']['headers']
  for header in headers:
      if header['name'] == 'To' and not to:
          to = header['value']
      elif header['name'] == 'Subject' and not subject:
          subject = header['value']
    elif header['name'] == 'Message' and not message:
      message = header['values']
  ...

通过在 update_draft 函数中提供上述逻辑，代理可以仅使用其中一个字段（以及 draft_id），并且我们可以按照用户期望的方式更新草稿。这意味着在更多情况下，代理可以成功完成任务，而不是返回错误或需要询问更多信息。

验证输入和代理错误处理

尽管在提示和容错方面尽了最大努力，但在某些情况下，代理调用的工具无法完成手头的任务。然而，我们可以检测到这一点，并提示代理自行恢复错误。

例如，在上面的 update_draft 示例中，如果代理调用函数时没有 draft_id，我们该怎么办？我们可以简单地传递空值并从 Gmail API 库返回错误，但我们也可以检测到空 draft_id 必然会导致错误，并转而为代理返回一个提示。

def update_draft(...):
  if draft_id == None:
    return "You did not provide a draft id when calling this function. If you previously created or retrieved the draft, the id is available in context"

现在，如果代理在没有 draft_id 的情况下调用 update_draft，它就会知道自己犯了什么确切的错误，并获得如何纠正问题的指示。

根据我使用此工具的经验，代理在收到此提示时，通常会立即以正确的方式调用 update_draft 函数；或者，如果没有可用的 draft_id，它会告知用户该问题并向用户索取 draft_id。这两种情况都比程序崩溃或向用户返回库中不透明的错误要好得多。

提供与工具相关的简单函数

代理在计算对计算机来说原本简单的函数时可能会遇到困难。例如，在构建用于在 Google Calendar 中创建事件的工具时，用户可能会向代理发送如下提示：

在我的日历上创建一个活动，以便明天下午 4 点与 adam@example.com 讨论 Tools PR

你能看出问题吗？如果我们尝试问 ChatGPT 今天是星期几：

agent.chat('what day is it?')
# > I apologize for the confusion. As an AI language model, I don't have real-time data or access to the current date. My responses are based on the information I was last trained on, which is up until September 2021. To find out the current day, I recommend checking your device's clock, referring to a calendar, or checking an online source for the current date.

代理不知道当前的日期是什么，因此代理可能会错误地调用函数，为日期提供一个像 tomorrow 这样的字符串，或者根据训练数据幻觉出过去某个时间的日期，或者将告知日期的负担推给用户。所有这些行为都会给用户带来摩擦和挫败感。

相反，在 Google Calendar Tool Spec 中，我们提供了一个简单的确定性函数供代理在需要获取日期时调用：

def get_date(self):
        """
        A function to return todays date.
        Call this before any other functions if you are unaware of the current date
        """
        return datetime.date.today()

现在，当代理尝试处理上述提示时，它可以首先调用函数获取日期，然后按照用户请求创建事件，推断出“明天”或“一周后”的日期。没有错误，没有猜测，也不需要进一步的用户交互！

从执行修改的函数返回提示

有些函数会修改数据，但修改后并不清楚从函数返回给代理什么有用的数据。例如，在 Google Calendar 工具中，如果一个事件成功创建，将事件内容返回给代理就没有意义，因为代理刚刚传入了所有信息，因此已经在上下文中。

通常对于专注于修改（创建、更新、删除）的函数，我们可以利用这些函数的返回值来进一步提示代理，从而帮助代理更好地理解其操作。例如，从 Google Calendar create_event 工具，我们可以执行以下操作：

def create_event(...):
  ...
  return 'Event created succesfully! You can move onto the next step.'

这有助于代理确认操作已成功，并鼓励它完成被提示执行的操作，特别是在创建 Google Calendar 事件只是多步指令中的一步时。我们仍然可以将 ID 作为这些提示的一部分返回：

def create_event(...):
  ...
  event = service.events().insert(...).execute()
  return 'Event created with id {event.id}! You can move onto the next step.'

将大型响应存储在索引中供代理读取

构建工具时已经提到过的一个考虑因素是代理拥有的上下文窗口大小。目前，LLM 的上下文窗口通常在 4k-16k 个 token 之间，当然也可能更大或更小。如果工具返回的数据大小超过上下文窗口，代理将无法处理数据并报错。

创建可能需要由 LoadAndSearchTool 包装的工具时，唯一需要考虑的是它们需要返回 LlamaIndex 文档列表。对于返回字符串的工具，为了使其与 LoadAndSearchTool 兼容，您唯一需要做的修改是将其包装在一个文档和一个数组中。

from llama_index.readers.schema.base import Document

# Not compatible
def large_text_response_function():
  ...
  return result

# LoadAndSearch compatible
def large_text_response_function():
  ...
  return [Document(text=result)]

验证代理如何理解工具

开发中调试工具的一个有用技术是询问代理关于它自己的工具：它有哪些可用工具，工具接受哪些参数，这些参数代表什么，以及工具的用途是什么。代理的回复有助于确定您的提示可能存在哪些不足，或帮助查明代理为何未能成功使用您正在开发的工具。

调试 Google Calendar Tool Spec 的示例对话，假设您已按照示例 Notebook 配置了代理：

agent.chat('what tools do you have available')
# I have the following tools available
#   1. `load_data`: This tool allows me to load data from your calendar. It can retrieve a specified number of events starting from a given date.
#   2. `create_event`: This tool allows me to create an event on your calendar. You can provide details such as the title, description, location, start datetime, end datetime, and attendees.
#   3. `get_date`: This tool returns today's date. It can be used to ensure that the correct date is used when working with other tools.
# Please let me know if you would like more information about any of these tools or if you have any specific requests.
agent.chat('what arguments does the load_data tool take')
# 'The `load_data` tool takes the following arguments:
#   1. `number_of_results` (optional, default: 100): Specifies the number of events to return from your calendar.
#   2. `start_date` (optional): Specifies the start date from which to retrieve events. It can be provided as a string in ISO format (e.g., "2023-07-05") or as a `datetime.date` object.
# You can use these arguments to customize the number of events and the start date for retrieving data from your calendar.