预约演示

微信咨询

【platformer.news】谷歌如何教会人工智能怀疑自己

31803980-7f81-4aee-be4f-9f65fe2c4570_1200x628.webp 今天,我们来谈谈 Bard 的进步,这是 Google 对 ChatGPT 的回应,以及它如何解决当今聊天机器人最紧迫的问题之一:它们倾向于编造故事。Today let’s talk about an advance in Bard, Google’s answer to ChatGPT, and how it addresses one of the most pressing problems with today’s chatbots: their tendency to make things up.

从去年聊天机器人问世那天起,它们的制造商就警告我们不要相信它们。ChatGPT 等工具生成的文本并不利用既定事实的数据库。相反,聊天机器人具有预测性——根据其底层大型语言模型所训练的大量文本语料库,对哪些单词似乎是正确的进行概率猜测。From the day that the chatbots arrived last year, their makers warned us not to trust them. The text generated by tools like ChatGPT does not draw on a database of established facts. Instead, chatbots are predictive — making probabilistic guesses about which words seem right based on the massive corpus of text that their underlying large language models were trained on.

因此,用行业术语来说,聊天机器人常常“确实是错误的”。这甚至可以愚弄受过高等教育的人,正如我们今年在As a result, chatbots are often “confidently wrong,” to use the industry’s term. And this can fool even highly educated people, as we saw this year with the case of 提交 ChatGPT 生成的引文的律师的案例中所看到的那样the lawyer who submitted citations generated by ChatGPT- 没有意识到每个案例都是完全捏造的。 — not realizing that every single case had been fabricated out of whole cloth.

这种状况解释了为什么我发现聊天机器人作为研究助理几乎毫无用处。他们会告诉你任何你想要的事情,通常在几秒钟内,但在大多数情况下不会引用他们的工作。结果,你最终会花费大量时间研究他们的答案,看看它们是否真实——这常常完全违背了使用它们的目的。This state of affairs explains why I find chatbots mostly useless as research assistants. They’ll tell you anything you want, often within seconds, but in most cases without citing their work. As a result, you wind up spending a lot of time researching their answers to see whether they’re true — often defeating the purpose of using them at all.

当今年早些时候推出时,谷歌的巴德附带了一个“Google It”按钮,可以将您的查询提交到该公司的搜索引擎。这使得获得有关聊天机器人输出的第二意见的速度稍快一些,但仍然给您带来了确定真假的负担。When it launched earlier this year, Google’s Bard came with a “Google It” button that submitted your query to the company’s search engine. This made it slightly faster to get a second opinion about the chatbot’s output, but still placed the burden for determining what is true and false squarely on you.

不过,从今天开始,巴德将代表您做更多的工作。聊天机器人回答您的一个查询后,点击 Google 按钮将“仔细检查”您的回答。Starting today, though, Bard will do a bit more work on your behalf. After the chatbot answers one of your queries, hitting the Google button will “double check” your response. Here’s 该公司在博客文章中how the company explained it in a blog post是这样解释的::

当您单击“G”图标时,巴德将读取响应并评估网络上是否有内容来证实它。当可以评估陈述时,您可以单击突出显示的短语并了解有关搜索找到的支持或矛盾信息的更多信息。When you click on the “G” icon, Bard will read the response and evaluate whether there is content across the web to substantiate it. When a statement can be evaluated, you can click the highlighted phrases and learn more about supporting or contradicting information found by Search.

仔细检查查询会将响应中的许多句子变成绿色或棕色。绿色突出显示的回复链接到引用的网页;将鼠标悬停在其中一个上,巴德将向您显示信息的来源。棕色突出显示的回复表明巴德不知道信息来自哪里,突出显示了可能的错误。Double-checking a query will turn many of the sentences within the response green or brown. Green-highlighted responses are linked to cited web pages; hover over one and Bard will show you the source of the information. Brown-highlighted responses indicate that Bard doesn’t know where the information came from, highlighting a likely mistake.

例如,当我仔细检查巴德对我有关 Radiohead 乐队历史的问题的回答时,它给了我很多绿色突出显示的句子,这些句子与我自己的知识相符。但这也把这句话变成了棕色:“他们赢得了无数奖项,包括六项格莱美奖和九项全英音乐奖。” 将鼠标悬停在这些词上表明谷歌搜索显示了矛盾的信息;事实上,Radiohead 从来没有赢得过一项全英音乐奖(罪孽深重),更不用说九次了。When I double-checked Bard’s answer to my question about the history of the band Radiohead, for example, it gave me lots of green-highlighted sentences that squared with my own knowledge. But it also turned this sentence brown: “They have won numerous awards, including six Grammy Awards and nine Brit Awards.” Hovering over the words showed that Google’s search had shown contradictory information; indeed, Radiohead has (criminally) never won a single Brit Award, much less nine of them.

“我要告诉你发生在我生命中的一场悲剧,”谷歌产品高级总监杰克·克劳奇克(Jack Krawczyk)上周在接受采访时告诉我。“I’m going to tell you about a tragedy that happened in my life,” Jack Krawczyk, a senior director of product at Google, told me in an interview last week.

克劳奇克在家里煮了箭鱼,由此产生的气味似乎弥漫在整个房子里。他利用巴德寻找摆脱它的方法,然后仔细检查结果以区分事实与虚构。事实证明,正如聊天机器人最初所说,彻底清洁厨房并不能解决问题。但在房子周围放几碗小苏打可能会有所帮助。Krawczyk had cooked swordfish at home, and the resulting smell seemed to permeate the entire house. He used Bard to look up ways to get rid of it, and then double-checked the results to separate fact from fiction. It turns out the cleaning the kitchen thoroughly would not fix the problem, as the chatbot had originally stated. But placing bowls of baking soda around the house might help.

If you’re wondering why Google doesn’t double-check answers like this 如果你想知道为什么谷歌在向你展示这样的答案之前before不会仔细检查答案,我也是如此。Krawczyk 告诉我,考虑到人们使用 Bard 的方式多种多样,仔细检查通常是不必要的。(你通常不会要求它仔细检查你写的一首诗,或者它起草的一封电子邮件,等等。) showing them to you, so did I. Krawczyk told me that, given the wide variety of ways people use Bard, double-checking is frequently unnecessary. (You wouldn’t typically ask it to double-check a poem you wrote, or an email it drafted, and so on.)

虽然双重检查代表着向前迈出的明显一步,但它仍然经常要求您提取所有这些引文并确保巴德正确解释这些搜索结果。至少在研究方面,人类仍然握着人工智能的手,就像它握着我们的手一样。And while double-checking represents a clear step forward, it does still often require you to pull up all those citations and make sure Bard is interpreting those search results correctly. At least when it comes to research, human beings are still holding the AI’s hand as much as it is holding ours.

尽管如此,这仍然是一个值得欢迎的发展。Still, it’s a welcome development.

“我们可能创建了第一个承认犯了错误的语言模型,”克劳奇克告诉我。考虑到这些模型改进的风险,确保人工智能模型准确地承认自己的错误应该成为行业的首要任务。“We may have created the first language model that admits it has made a mistake,” Krawczyk told me. And given the stakes as these models improve, ensuring that AI models accurately confess to their mistakes ought to be a high priority for the industry.

Bard 周二获得了另一个重大更新:它现在可以连接到您的 Gmail、Docs、Drive 以及一些其他 Google 产品,包括 YouTube 和地图。扩展程序,顾名思义,可以让您实时搜索、总结和询问有关您存储在 Google 帐户中的文档的问题。Bard got another big update Tuesday: it can now connect to your Gmail, Docs, Drive, and a handful of other Google products, including YouTube and Maps. Extensions, as they’re called, let you search, summarize, and ask questions about documents you have stored in your Google account in real time.

目前,它仅限于个人帐户,这极大地限制了它的实用性,至少对我来说是这样。作为浏览网页的另一种方式,它有时很有趣——例如,当我要求它向我展示一些有关室内设计入门的精彩视频时,它做得很好。(事实上​​,您可以在巴德答案窗口中内嵌播放这些视频,这是一个很好的接触。)For now, it’s limited to personal accounts, which dramatically limits its utility, at least for me. It is sometimes interesting as an alternative way to browse the web — it did a good job, for example, when I asked it to show me some good videos about getting started in interior design. (The fact that you can play those videos inline in the Bard answer window is a nice touch.)

但是扩展程序也会出现很多错误,而且这里没有按钮可以改善结果。当我要求 Bard 查找我与一位朋友最旧的电子邮件(我已经在 Gmail 中与之交换消息已有 20 年了)时,Bard 向我展示了一条 2021 年的邮件。当我询问它收件箱中的哪些邮件可能需要及时回复时,巴德建议发送一封垃圾邮件,其主题行是“HP Instant Ink 可以实现无忧打印”。But extensions get a lot of stuff wrong, too, and there’s no button to press here to improve the results. When I asked Bard to find my oldest email with a friend who I’ve been exchanging messages with in Gmail for 20 years now, Bard showed me a message from 2021. When I asked it which messages in my inbox might need a prompt response, Bard suggested a piece of spam with the subject line “Hassle-free printing is possible with HP Instant Ink.”

在谷歌能赚钱的场景下它表现得更好。要求它计划去日本旅行的行程,包括航班和酒店信息,它会提供一系列不错的选择,谷歌可以从中抽取分成。It does better in scenarios where Google can make money. Ask it to plan an itinerary for a trip to Japan including flight and hotel information, and it will pull up a good selection of choices from which Google can take a cut of the purchase.

最终,我想象 Bard 将会有扩展,Eventually, I imagine that extensions will come to Bard, 就像以前的 ChatGPT 一样just as they previously have to ChatGPT。(它们在那里被称为插件。)通过对话界面在网络上完成任务的前景是巨大的,即使今天的体验只是马马虎虎。. (They’re called plug-ins over there.) The promise of being able to get things done on the web through a conversational interface is huge, even if the experience today is only so-so.

从长远来看,问题是人工智能最终能够在多大程度上检查自己的工作。如今,引导聊天机器人找到正确答案的任务仍然沉重地落在输入提示的人身上。此时此刻,非常需要推动人工智能引用其作品的工具。但最终,我们希望更多的工作能够落在工具本身上——而不是我们总是要求它。 The question over the long term is how well AI will ultimately be able to check its own work. Today, the task of steering chatbots toward the right answer still weighs heavily on the person typing the prompt. In this moment, tools that push AIs to cite their work are greatly needed. Eventually, though, here’s hoping that more of that work falls on the tools themselves — and without us always having to ask for it.