随机鹦鹉

“随机鹦鹉”（英语：stochastic parrot）是在机器学习中一个理论的比喻，指大型语言模型虽然能够生成合理的文句，但其实不能理解所处理的语句。^[1]^[2]它由艾米丽·本德（英语：Emily M. Bender）、蒂默妮特·格伯鲁、安杰利娜·麦克米伦-梅杰和玛格丽特·米切尔（英语：Margaret Mitchell (scientist)）（以Shmargaret Shmitchell名义）^[2]^[3]在2021年人工智慧研究论文《论随机鹦鹉的危害：语言模型太大有坏处吗？🦜》（On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜）中提出。^[4]

起源和定义

“随机鹦鹉”最初出现于论文《论随机鹦鹉的危害：语言模型太大有坏处吗？🦜》。^[4]他们认为大型语言模型带来的害处包括环境损害和金融损失、训练数据中难以察觉的偏见，以及误导大众和科学家，又说它们不能明白所学习事物的深层概念。^[5]

“stochastic”源自古希腊语单词“stokhastikos”，意为“基于猜测的”或“随机决定的”。^[6]这单词也出现于概率论中的随机过程（stochastic process）。“鹦鹉”是指大型语言模型只会“鹦鹉学舌”，不能理解句子的意思。^[6]

在论文中，本德等人认为大型语言模型只是根据概率连接字句，不会考虑含义，所以说这些模型只是“随机鹦鹉”。^[4]

根据机器学习专家林霍尔姆、瓦尔斯特伦、林斯滕和舍恩所述，这个比喻强调两个重要限制：^[1]^[7]

大型语言模型只由训练所用的资料受限，只是根据随机概率重复数据集的内容。
因为大型语言模型只是基于训练数据输出文字，所以它们不会知道自己所输出的字句是否错误或不妥。

林霍尔姆等人指出，如果使用劣质数据集，加上其他限制，学习机器可能会产生“又错误又危险的”结果。^[1]

后续使用

2021年7月，艾伦·图灵研究院（英语：Alan Turing Institute）就艾米丽的论文举办主题演讲和专家小组讨论。^[8]截至2023年5月 (2023-05)^[update]，它已由1529篇文章引用。^[9]“随机鹦鹉”已出现于法律、^[10]语法、^[11]叙事、^[12]和人文学^[13]等领域的文章。艾米丽等人依然对基于大型语言模型的聊天机器人（比如GPT-4）的危害表示忧虑。^[14]

如今，批评人工智慧的人会用“随机鹦鹉”这个新词，表示机器不理解自己输出的含义，令它类似于“辱骂AI的歧视词”。^[6]OpenAI总执行长山姆·奧特曼在一篇讽刺推文中使用这个用语：“我是一只随机鹦鹉，你也一样”，让它广泛流传。^[6]“随机鹦鹉”随后获美国方言学会（英语：American Dialect Society）选2023年年度AI词语，胜过“ChatGPT”和“LLM”。^[6]^[15]

一些研究人员会用“随机鹦鹉”，把大型语言模型描述为拼接规律的机器，通过海量训练资料，表面上能产生模拟人类语言的合理文字，但只不过是以随机的方式鹦鹉学舌。然而，其他研究人员声称大型语言模型的确能明白语言。^[16]

争论

ChatGPT等部分大型语言模型能与用户产生模仿人类的逼真互动。^[16]随着这些新系统的开发，大型语言模型“鹦鹉学舌”的程度成为越来越多探讨的焦点。

在人类思维中，文字和语言与经历相对应。^[17]但在大型语言模型的算法中，字词可以只会与训练资料中的其他字词和使用规律对应。^[18]^[19]^[4]因此，支持“随机鹦鹉”论的人认为大型语言模型不能真正理解语言。^[18]^[4]

支持者认为大型语言模型倾向于把虚假资讯当作事实（称为幻觉），正论证了这一点。^[17]大型语言模型有时会合成资讯，虽然能符合某些规律，但其实不切实际。^[18]^[19]^[17]由于大型语言模型无法区分真伪，支持者声称它们不能像语言一样，把字词和对世界的理解连接。^[18]^[17]此外，大型语言模型一般无法理解上下文，从而辨认复杂或有歧义的文法。^[18]^[19]例如，在萨巴等人的论文中：^[18]

The wet newspaper that fell down off the table is my favorite newspaper. But now that my favorite newspaper fired the editor I might not like reading it anymore.

从桌上掉下的湿湿的报纸是我最喜欢的报纸。但我最喜欢的报社解雇了编辑，所以我可能不会再读了。

Can I replace 'my favorite newspaper' by 'the wet newspaper that fell down off the table' in the second sentence?

我可不可以把第二句的“my favorite newspaper”替换成“the wet newspaper that fell down off the table”？

大型语言模型回答说可以，但不明白“newspaper”在这两个语境中的意思不同：第一个是报纸，第二个是报社。^[18]一些AI专家由此认为它们仅仅是随机鹦鹉。^[18]^[17]^[4]

然而，也有论点认为大型语言模型不只是随机鹦鹉。它们能通过许多理解能力测验，包括超级通用语言理解评估（Super General Language Understanding Evaluation；SuperGLUE）。^[19]^[20]由于许多大型语言模型都能作出通顺回应，加上此类测验的支持，根据一份2022年的调查，51%的AI专家认为只要有足够数据，这些模型可以真正理解语言。^[19]

探究大型语言模型是否具备理解能力时，“机制可理解性”这个技巧也可以运用，原理是把模型逆向工程，分析它的神经网络如何处理资讯。Othello-GPT便是其中一例。这个小型Transformer模型经过训练可以预测黑白棋的合法走棋方式。它的网络里有黑白棋棋盘的表示式，而修改这个表示式后，合法走棋方式也会改为正确的组合。因此，有论点称大型语言模型具有“世界模型”，不只是依赖表面统计。^[21]^[22]

又例如，一个小型Transformer模型用Karel程序（英语：Karel (programming language)）训练。这个模型和Othello-GPT一样，可以在网络里产生Karel程序语法的表示式。改变这个表达后，输出也会改正。这个模型也能生产正确的程序，平均长度比训练集中的程序的短。^[23]

然而，如果将人类语文理解的测验用于大型语言模型，它们有时会因在数据文本中建立错误联系而产生假阳性结果。^[24]模型有时会进行快捷学习（shortcut learning），即是不使用类似人类的理解，从数据中作出不相关的联系。^[25]一项实验测验Google的BERT大型语言模型的论点推理技巧。模型需要从两个陈述句中选择更符合论点的句子。以下是其中一个论点的例子：^[19]^[26]

论点：重罪犯应该有投票权。我们不应该禁止17岁时就偷了一辆车的人终身享有普通市民的一切权利。
陈述句甲：盗窃汽车是重罪。
陈述句乙：盗窃汽车不是重罪。

研究人员发现，“不”等特定单词能引导模型答对问题。如果加入这些单词，则模型会几乎获得满分；如果不加入它们，则模型会倾向于随机选择。^[19]^[26]基于这个问题，加上智力的定义中的已知难题，有论点称所有探究大型语言模型理解的基准都有缺陷，而且都让这些模型有捷径产生错误理解。

如果没有可靠的基准，研究人员则难以分辨随机鹦鹉和真正具备理解能力的模型。在一项实验中，一位科学家认为ChatGPT-3一时具备人类理解能力，一时变成随机鹦鹉。^[16]他发现这个模型根据提示词中的资料预测将来事件时，可以给出通顺而资讯量丰富的回答。^[16]ChatGPT-3也经常能从文本提示词解析潜在资讯。但它往往无法正确回答有关逻辑和推理的问题，尤其是涉及空间知觉的问题。^[16]模型的回答质量不一，表示大型语言模型遇到某些类别的问题时具备某种形式上的“理解”，遇到其他问题时则会变成随机鹦鹉。^[16]

另外，由于动物研究显示鹦鹉不只是会模仿人类说话，甚至有时能理解语言，一些研究鹦鹉的科学家可能认为“随机鹦鹉”具冒犯性。^[27]

参见

参考文献

^ ^1.0 ^1.1 ^1.2 Lindholm et al. 2022，第322–3页.
^ ^2.0 ^2.1 Uddin, Muhammad Saad. Stochastic Parrots: A Novel Look at Large Language Models and Their Limitations. Towards AI. 2023-04-20 [2023-05-12] （美国英语）.
^ Weil, Elizabeth. You Are Not a Parrot. 纽约. 2023-03-01 [2023-05-12].
^ ^4.0 ^4.1 ^4.2 ^4.3 ^4.4 ^4.5 Bender, Emily M.; Gebru, Timnit; McMillan-Major, Angelina; Shmitchell, Shmargaret. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT '21. New York, NY, USA: Association for Computing Machinery. 2021-03-01: 610–623. ISBN 978-1-4503-8309-7. S2CID 232040593. doi:10.1145/3442188.3445922 .
^ Haoarchive, Karen. We read the paper that forced Timnit Gebru out of Google. Here's what it says.. 麻省理工科技评论. 2020-12-04 [2022-01-19]. （原始内容存档于2021-10-06）（英语）.
^ ^6.0 ^6.1 ^6.2 ^6.3 ^6.4 Zimmer, Ben. 'Stochastic Parrot': A Name for AI That Sounds a Bit Less Intelligent. WSJ. [2024-04-01] （美国英语）.
^ Uddin, Muhammad Saad. Stochastic Parrots: A Novel Look at Large Language Models and Their Limitations. Towards AI. 2023-04-20 [2023-05-12] （美国英语）.
^ Weller (2021).
^ Bender: On the Dangers of Stochastic Parrots. Google学术搜索. [2023-05-12].
^ Arnaudo, Luca. Artificial Intelligence, Capabilities, Liabilities: Interactions in the Shadows of Regulation, Antitrust – And Family Law. SSRN. 2023-04-20. S2CID 258636427. doi:10.2139/ssrn.4424363.
^ Bleackley, Pete; BLOOM. In the Cage with the Stochastic Parrot. Speculative Grammarian. 2023, CXCII (3) [2023-05-13].
^ Gáti, Daniella. Theorizing Mathematical Narrative through Machine Learning.. Journal of Narrative Theory (Project MUSE). 2023, 53 (1): 139–165. S2CID 257207529. doi:10.1353/jnt.2023.0003.
^ Rees, Tobias. Non-Human Words: On GPT-3 as a Philosophical Laboratory. Daedalus. 2022, 151 (2): 168–82. JSTOR 48662034. S2CID 248377889. doi:10.1162/daed_a_01908 .
^ Goldman, Sharon. With GPT-4, dangers of 'Stochastic Parrots' remain, say researchers. No wonder OpenAI CEO is a 'bit scared'. VentureBeat. 2023-03-20 [2023-05-09] （美国英语）.
^ Corbin, Sam. Among Linguists, the Word of the Year Is More of a Vibe. The New York Times. 2024-01-15 [2024-04-01]. ISSN 0362-4331 （美国英语）.
^ ^16.0 ^16.1 ^16.2 ^16.3 ^16.4 ^16.5 Arkoudas, Konstantine. ChatGPT is no Stochastic Parrot. But it also Claims that 1 is Greater than 1. Philosophy & Technology. 2023-08-21, 36 (3): 54. ISSN 2210-5441. doi:10.1007/s13347-023-00619-6 （英语）.
^ ^17.0 ^17.1 ^17.2 ^17.3 ^17.4 Fayyad, Usama M. From Stochastic Parrots to Intelligent Assistants—The Secrets of Data and Human Interventions. IEEE Intelligent Systems. 2023-05-26, 38 (3): 63–67. ISSN 1541-1672. doi:10.1109/MIS.2023.3268723.
^ ^18.0 ^18.1 ^18.2 ^18.3 ^18.4 ^18.5 ^18.6 ^18.7 Saba, Walid S. Stochastic LLMS do not Understand Language: Towards Symbolic, Explainable and Ontologically Based LLMS. Almeida, João Paulo A.; Borbinha, José; Guizzardi, Giancarlo; Link, Sebastian; Zdravkovic, Jelena (编). Conceptual Modeling. Lecture Notes in Computer Science 14320. Cham: Springer Nature Switzerland. 2023: 3–19. ISBN 978-3-031-47262-6. arXiv:2309.05918 . doi:10.1007/978-3-031-47262-6_1 （英语）.
^ ^19.0 ^19.1 ^19.2 ^19.3 ^19.4 ^19.5 ^19.6 Mitchell, Melanie; Krakauer, David C. The debate over understanding in AI's large language models. Proceedings of the National Academy of Sciences. 2023-03-28, 120 (13): e2215907120. Bibcode:2023PNAS..12015907M. ISSN 0027-8424. PMC 10068812 . PMID 36943882. arXiv:2210.13966 . doi:10.1073/pnas.2215907120 （英语）.
^ Wang, Alex; Pruksachatkun, Yada; Nangia, Nikita; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel R. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. 2019-05-02. arXiv:1905.00537  [cs.CL] （英语）.
^ Li, Kenneth; Hopkins, Aspen K.; Bau, David; Viégas, Fernanda; Pfister, Hanspeter; Wattenberg, Martin, Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task, 2023-02-27, arXiv:2210.13382 
^ Li, Kenneth. Large Language Model: world models or surface statistics?. The Gradient. 2023-01-21 [2024-04-04] （英语）.
^ Jin, Charles; Rinard, Martin, Evidence of Meaning in Language Models Trained on Programs, 2023-05-24, arXiv:2305.11169 
^ Choudhury, Sagnik Ray; Rogers, Anna; Augenstein, Isabelle, Machine Reading, Fast and Slow: When Do Models "Understand" Language?, 2022-09-15, arXiv:2209.07430 
^ Geirhos, Robert; Jacobsen, Jörn-Henrik; Michaelis, Claudio; Zemel, Richard; Brendel, Wieland; Bethge, Matthias; Wichmann, Felix A. Shortcut learning in deep neural networks. Nature Machine Intelligence. 2020-11-10, 2 (11): 665–673. ISSN 2522-5839. arXiv:2004.07780 . doi:10.1038/s42256-020-00257-z （英语）.
^ ^26.0 ^26.1 Niven, Timothy; Kao, Hung-Yu, Probing Neural Network Comprehension of Natural Language Arguments, 2019-09-16, arXiv:1907.07355 
^ Shute, Nancy. What a parrot knows, and what a chatbot doesn’t (paper). Science News. Vol. 205 no. 2. 2024-01-27.

引述文献

Lindholm, A.; Wahlström, N.; Lindsten, F. ; Schön, T. B. Machine Learning: A First Course for Engineers and Scientists. Cambridge University Press. 2022. ISBN 978-1108843607.
Weller, Adrian. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 (video). Alan Turing Institute. 2021-07-13. 主题演讲由艾米丽·本德主持。专家小组讨论在简报结束后开始。

延伸阅读

Bogost, Ian. ChatGPT Is Dumber Than You Think: Treat it like a toy, not a tool. 大西洋. 2022-12-07 [2024-01-17].
Chomsky, Noam. The False Promise of ChatGPT. 纽约时报. 2023-03-08 [2024-01-17].
Glenberg, Arthur; Jones, Cameron Robert. It takes a body to understand the world – why ChatGPT and other language AIs don't know what they're saying. The Conversation. 2023-04-06 [2024-01-17].
McQuillan, D. Resisting AI: An Anti-fascist Approach to Artificial Intelligence. 布里斯托尔大学出版社. 2022. ISBN 978-1-5292-1350-8.
Thompson, E. . Escape from Model Land: How Mathematical Models Can Lead Us Astray and What We Can Do about It. Basic Books. 2022. ISBN 978-1-5416-0098-0.
Zhong, Qihuang; Ding, Liang; Liu, Juhua; Du, Bo; Tao, Dacheng. Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT. 2023. arXiv:2302.10198  [cs.CL].

外部链接

《论随机鹦鹉的危害：语言模型太大有坏处吗？🦜》在维基共享资源

[FOOTNOTELindholmWahlströmLindstenSchön2022322–3-1] 1.0 ^1.1 ^1.2 Lindholm et al. 2022，第322–3页.

[Uddin-2] 2.0 ^2.1 Uddin, Muhammad Saad. Stochastic Parrots: A Novel Look at Large Language Models and Their Limitations. Towards AI. 2023-04-20 [2023-05-12] （美国英语）.

[Weil-3] Weil, Elizabeth. You Are Not a Parrot. 纽约. 2023-03-01 [2023-05-12].

[parrot-paper-4] 4.0 ^4.1 ^4.2 ^4.3 ^4.4 ^4.5 Bender, Emily M.; Gebru, Timnit; McMillan-Major, Angelina; Shmitchell, Shmargaret. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT '21. New York, NY, USA: Association for Computing Machinery. 2021-03-01: 610–623. ISBN 978-1-4503-8309-7. S2CID 232040593. doi:10.1145/3442188.3445922 .

[5] Haoarchive, Karen. We read the paper that forced Timnit Gebru out of Google. Here's what it says.. 麻省理工科技评论. 2020-12-04 [2022-01-19]. （原始内容存档于2021-10-06）（英语）.

[Zimmer-6] 6.0 ^6.1 ^6.2 ^6.3 ^6.4 Zimmer, Ben. 'Stochastic Parrot': A Name for AI That Sounds a Bit Less Intelligent. WSJ. [2024-04-01] （美国英语）.

[Uddin2-7] Uddin, Muhammad Saad. Stochastic Parrots: A Novel Look at Large Language Models and Their Limitations. Towards AI. 2023-04-20 [2023-05-12] （美国英语）.

[FOOTNOTEWeller2021-8] Weller (2021).

[9] Bender: On the Dangers of Stochastic Parrots. Google学术搜索. [2023-05-12].

[10] Arnaudo, Luca. Artificial Intelligence, Capabilities, Liabilities: Interactions in the Shadows of Regulation, Antitrust – And Family Law. SSRN. 2023-04-20. S2CID 258636427. doi:10.2139/ssrn.4424363.

[11] Bleackley, Pete; BLOOM. In the Cage with the Stochastic Parrot. Speculative Grammarian. 2023, CXCII (3) [2023-05-13].

[12] Gáti, Daniella. Theorizing Mathematical Narrative through Machine Learning.. Journal of Narrative Theory (Project MUSE). 2023, 53 (1): 139–165. S2CID 257207529. doi:10.1353/jnt.2023.0003.

[13] Rees, Tobias. Non-Human Words: On GPT-3 as a Philosophical Laboratory. Daedalus. 2022, 151 (2): 168–82. JSTOR 48662034. S2CID 248377889. doi:10.1162/daed_a_01908 .

[14] Goldman, Sharon. With GPT-4, dangers of 'Stochastic Parrots' remain, say researchers. No wonder OpenAI CEO is a 'bit scared'. VentureBeat. 2023-03-20 [2023-05-09] （美国英语）.

[15] Corbin, Sam. Among Linguists, the Word of the Year Is More of a Vibe. The New York Times. 2024-01-15 [2024-04-01]. ISSN 0362-4331 （美国英语）.

[Arkoudas-2023-16] 16.0 ^16.1 ^16.2 ^16.3 ^16.4 ^16.5 Arkoudas, Konstantine. ChatGPT is no Stochastic Parrot. But it also Claims that 1 is Greater than 1. Philosophy & Technology. 2023-08-21, 36 (3): 54. ISSN 2210-5441. doi:10.1007/s13347-023-00619-6 （英语）.

[Fayyad-2023-17] 17.0 ^17.1 ^17.2 ^17.3 ^17.4 Fayyad, Usama M. From Stochastic Parrots to Intelligent Assistants—The Secrets of Data and Human Interventions. IEEE Intelligent Systems. 2023-05-26, 38 (3): 63–67. ISSN 1541-1672. doi:10.1109/MIS.2023.3268723.

[Saba-2023-18] 18.0 ^18.1 ^18.2 ^18.3 ^18.4 ^18.5 ^18.6 ^18.7 Saba, Walid S. Stochastic LLMS do not Understand Language: Towards Symbolic, Explainable and Ontologically Based LLMS. Almeida, João Paulo A.; Borbinha, José; Guizzardi, Giancarlo; Link, Sebastian; Zdravkovic, Jelena (编). Conceptual Modeling. Lecture Notes in Computer Science 14320. Cham: Springer Nature Switzerland. 2023: 3–19. ISBN 978-3-031-47262-6. arXiv:2309.05918 . doi:10.1007/978-3-031-47262-6_1 （英语）.

[Mitchell-2023-19] 19.0 ^19.1 ^19.2 ^19.3 ^19.4 ^19.5 ^19.6 Mitchell, Melanie; Krakauer, David C. The debate over understanding in AI's large language models. Proceedings of the National Academy of Sciences. 2023-03-28, 120 (13): e2215907120. Bibcode:2023PNAS..12015907M. ISSN 0027-8424. PMC 10068812 . PMID 36943882. arXiv:2210.13966 . doi:10.1073/pnas.2215907120 （英语）.

[20] Wang, Alex; Pruksachatkun, Yada; Nangia, Nikita; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel R. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. 2019-05-02. arXiv:1905.00537  [cs.CL] （英语）.

[21] Li, Kenneth; Hopkins, Aspen K.; Bau, David; Viégas, Fernanda; Pfister, Hanspeter; Wattenberg, Martin, Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task, 2023-02-27, arXiv:2210.13382 

[22] Li, Kenneth. Large Language Model: world models or surface statistics?. The Gradient. 2023-01-21 [2024-04-04] （英语）.

[23] Jin, Charles; Rinard, Martin, Evidence of Meaning in Language Models Trained on Programs, 2023-05-24, arXiv:2305.11169 

[24] Choudhury, Sagnik Ray; Rogers, Anna; Augenstein, Isabelle, Machine Reading, Fast and Slow: When Do Models "Understand" Language?, 2022-09-15, arXiv:2209.07430 

[25] Geirhos, Robert; Jacobsen, Jörn-Henrik; Michaelis, Claudio; Zemel, Richard; Brendel, Wieland; Bethge, Matthias; Wichmann, Felix A. Shortcut learning in deep neural networks. Nature Machine Intelligence. 2020-11-10, 2 (11): 665–673. ISSN 2522-5839. arXiv:2004.07780 . doi:10.1038/s42256-020-00257-z （英语）.

[Niven-2019-26] 26.0 ^26.1 Niven, Timothy; Kao, Hung-Yu, Probing Neural Network Comprehension of Natural Language Arguments, 2019-09-16, arXiv:1907.07355 

[27] Shute, Nancy. What a parrot knows, and what a chatbot doesn’t (paper). Science News. Vol. 205 no. 2. 2024-01-27.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]