使用者:月心雪/沙盒

BERT

變換器的雙向編碼器表徵技術(BERT)是通過Google進行的自然語言處理(NLP)的一種預訓練技術^[1]^[2]。2018年，雅各布·德夫林和他來自谷歌的同事創建並發布了BERT。谷歌正在利用BERT來更好地理解用戶的搜索含義。^[3] 原始的英語語言 BERT 模型有兩種預先訓練的一般類型:^[1](1)BERTBASE模型，一個12層，768隱藏，12頭，110M的參數神經網絡結構，(2) BERTLARGE模型，一個24層，1024隱藏，16頭，340M的參數神經網絡結構; 兩者都是在有800M單詞的{[BooksCorpus]]^[4]以及一個擁有2500M單詞的英文版維基百科上訓練的。

性能

當 BERT 出版時，它在一些自然語言的理解任務上表現得最為先進：^[1]

GLUE (通用語言理解評估)任務集(包括9個任務)。
SQuAD (Stanford Question Answering Dataset) v1.1和 v2.0。
SWAG (對抗生成的情境)

分析

BERT 在這些自然語言理解任務上表現出最先進水平的原因還沒有得到很好的解釋。^[5]^[6]目前的研究主要集中在精心選擇的輸入序列背後的 BERT 輸出關係，^[7]^[8]通過探測分類器分析內部向量表示，^[9]^[10]以及注意力權重表示的關係。^[5]^[6]

歷史

BERT起源於訓練前的語境表示，包括半監督序列學習,^[11]生成預訓練，ELMo，^[12]和ULMFit. ^[13]與以前的模型不同，BERT是一種深度雙向的、無監督的語言表達，僅使用純文本語料庫進行預訓練。上下文無關模型(如 word2vec 或 GloVe)為詞彙表中的每個單詞生成一個單詞嵌入表示法，其中BERT考慮給定單詞每次出現的上下文。例如，儘管」跑步」的矢量在「他在經營一家公司」和」他在跑馬拉松」兩句中的出現具有相同的word2vec矢量表示，但BERT將提供一種上下文嵌入，可以根據句子表達的不同而不同。 2019年10月25日，Google搜尋宣布他們已經開始在美國國內的英語搜索查詢中應用BERT模型。^[14]2019年12月9日，據報道，Google搜尋已經採用了BERT，涵蓋了70多種語言。^[15]

獲獎情況

在2019年美國計算機語言學協會北美分會年會上，BERT獲得了最佳長篇論文獎。^[16]

參見

參考文獻

^ ^1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].
^ Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27] （英語）.
^ Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27] （英語）.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. arXiv:1506.06724  [cs.CV]. cite arXiv模板填寫了不支持的參數 (幫助)
^ ^5.0 ^5.1 Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna. Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373. S2CID 201645145. doi:10.18653/v1/D19-1445 （美國英語）.
^ ^6.0 ^6.1 Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286. doi:10.18653/v1/w19-4828 .
^ Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294. Bibcode:2018arXiv180504623K. S2CID 21700944. arXiv:1805.04623 . doi:10.18653/v1/p18-1027.
^ Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205. Bibcode:2018arXiv180311138G. S2CID 4460159. arXiv:1803.11138 . doi:10.18653/v1/n18-1108.
^ Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248. Bibcode:2018arXiv180808079G. S2CID 52090220. arXiv:1808.08079 . doi:10.18653/v1/w18-5426.
^ Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361. doi:10.18653/v1/w18-5448 .
^ Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 4 November 2015. arXiv:1511.01432  [cs.LG].
^ Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 15 February 2018. arXiv:1802.05365v2  [cs.CL].
^ Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 18 January 2018. arXiv:1801.06146v5  [cs.CL].
^ Nayak, Pandu. Understanding searches better than ever before. Google Blog. 25 October 2019 [10 December 2019].
^ Montti, Roger. Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 10 December 2019 [10 December 2019].
^ Best Paper Awards. NAACL. 2019 [Mar 28, 2020].

外部連結

Official GitHub repository

[:0-1] 1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].

[2] Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27] （英語）.

[3] Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27] （英語）.

[4] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. arXiv:1506.06724  [cs.CV]. cite arXiv模板填寫了不支持的參數 (幫助)

[:1-5] 5.0 ^5.1 Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna. Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373. S2CID 201645145. doi:10.18653/v1/D19-1445 （美國英語）.

[:2-6] 6.0 ^6.1 Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286. doi:10.18653/v1/w19-4828 .

[7] Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294. Bibcode:2018arXiv180504623K. S2CID 21700944. arXiv:1805.04623 . doi:10.18653/v1/p18-1027.

[8] Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205. Bibcode:2018arXiv180311138G. S2CID 4460159. arXiv:1803.11138 . doi:10.18653/v1/n18-1108.

[9] Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248. Bibcode:2018arXiv180808079G. S2CID 52090220. arXiv:1808.08079 . doi:10.18653/v1/w18-5426.

[10] Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361. doi:10.18653/v1/w18-5448 .

[11] Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 4 November 2015. arXiv:1511.01432  [cs.LG].

[12] Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 15 February 2018. arXiv:1802.05365v2  [cs.CL].

[13] Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 18 January 2018. arXiv:1801.06146v5  [cs.CL].

[14] Nayak, Pandu. Understanding searches better than ever before. Google Blog. 25 October 2019 [10 December 2019].

[15] Montti, Roger. Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 10 December 2019 [10 December 2019].

[16] Best Paper Awards. NAACL. 2019 [Mar 28, 2020].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

閱論編自然語言處理
一般術語	語料庫口語語料庫停用詞詞袋完全人工智慧（英語：AI-complete） n元語法（雙字母組、三元語法（英語：Trigrams））
文本挖掘	文本分割詞性標註（英語：Part-of-speech tagging）拆句處理（英語：Shallow parsing）複合詞處理（英語：Compound term processing）搭配提取（英語：Collocation extraction）詞幹提取詞形還原命名實體識別指代文本情感分析概念挖掘（英語：Concept mining）語法分析詞義消歧術語提取（英語：Terminology extraction）真實大小寫處理（英語：Truecasing）
自動摘要（英語：Automatic summarization）	多文檔摘要（英語：Multi-document summarization）句子抽取（英語：Sentence extraction）文本簡化（英語：Text simplification）
分佈語義（英語：Distributional semantics）模型	潛在語義學 Seq2Seq模型 Word2vec 語言模型大型語言模型基礎模型 LLaMA ChatGPT GPT-4 文心一言詞嵌入
機器翻譯	電腦輔助翻譯基於實例（英語：Example-based machine translation）基於規則（英語：Rule-based machine translation）
自動識別與數據採集	語音識別語音合成光學字符識別自然語言生成提示工程
主題模型	彈珠分布（英語：Pachinko allocation）隱含狄利克雷分布潛在語義索引
計算機輔助審查（英語：Computer-assisted reviewing）	自動作文評分（英語：Automated essay scoring）語料庫檢索工具（英語：Concordancer）文法檢查器（英語：Grammar checker）預測文本（英語：Predictive text）拼寫檢查語法猜測（英語：Syntax guessing）
自然語言用戶界面（英語：Natural language user interface）	自動在線助手聊天機器人文字冒險遊戲問答系統