受限玻爾茲曼機

受限玻爾茲曼機（英語：restricted Boltzmann machine, RBM）是一種可通過輸入數據集學習概率分布的隨機生成神經網絡。RBM最初由發明者保羅·斯模稜斯基（英語：Paul Smolensky）於1986年命名為簧風琴（Harmonium）^[1]，但直到傑弗里·辛頓及其合作者在2000年代中葉發明快速學習算法後，受限玻茲曼機才變得知名。受限玻茲曼機在降維^[2]、分類^[3]、協同過濾^[4]、特徵學習^[5]和主題建模^[6]中得到了應用。根據任務的不同，受限玻茲曼機可以使用監督學習或無監督學習的方法進行訓練。

正如名字所提示的那樣，受限玻茲曼機是一種玻茲曼機的變體，但限定模型必須為二分圖。模型中包含對應輸入參數的輸入（可見）單元和對應訓練結果的隱單元，圖中的每條邊必須連接一個可見單元和一個隱單元。（與此相對，「無限制」玻茲曼機包含隱單元間的邊，使之成為循環神經網絡。）這一限定使得相比一般玻茲曼機更高效的訓練算法成為可能，特別是基於梯度的對比分歧（contrastive divergence）算法^[7]。

受限玻茲曼機也可被用於深度學習網絡。具體地，深度信念網絡可使用多個RBM堆疊而成，並可使用梯度下降法和反向傳播算法進行調優^[8]。

結構

標準的受限玻爾茲曼機由二值（布爾/伯努利）隱層和可見層單元組成。權重矩陣 $W=(w_{i,j})$ 中的每個元素指定了隱層單元 $h_{i}$ 和可見層單元 $v_{j}$ 之間邊的權重。此外對於每個可見層單元 $v_{i}$ 有偏置 $a_{i}$ ，對每個隱層單元 $h_{j}$ 有偏置 $b_{j}$ 。在這些定義下，一種受限玻爾茲曼機配置（即給定每個單元取值）的「能量」 $(v, h)$ 被定義為

E(v,h)=-\sum _{i}a_{i}v_{i}-\sum _{j}b_{j}h_{j}-\sum _{i}\sum _{j}h_{j}w_{i,j}v_{i}

或者用矩陣的形式表示如下：

E(v,h)=-a^{\mathrm {T} }v-b^{\mathrm {T} }h-h^{\mathrm {T} }Wv

這一能量函數的形式與霍普菲爾德神經網絡相似。在一般的玻爾茲曼機中，隱層和可見層之間的聯合概率分布由能量函數給出：^[9]

P(v,h)={\frac {1}{Z}}e^{-E(v,h)}

其中， $Z$ 為配分函數，定義為在節點的所有可能取值下 $e^{-E(v,h)}$ 的和（亦即使得概率分布和為1的歸一化常數）。類似地，可見層取值的邊緣分布可通過對所有隱層配置求和得到：^[9]

P(v)={\frac {1}{Z}}\sum _{h}e^{-E(v,h)}

由於RBM為一個二分圖，層內沒有邊相連，因而隱層是否激活在給定可見層節點取值的情況下是條件獨立的。類似地，可見層節點的激活狀態在給定隱層取值的情況下也條件獨立^[7]。亦即，對 $m$ 個可見層節點和 $n$ 個隱層節點，可見層的配置 $v$ 對於隱層配置 $h$ 的條件概率如下：

P(v|h)=\prod _{i=1}^{m}P(v_{i}|h)

.

類似地， $h$ 對於 $v$ 的條件概率為

P(h|v)=\prod _{j=1}^{n}P(h_{j}|v)

.

其中，單個節點的激活概率為

P(h_{j}=1|v)=\sigma \left(b_{j}+\sum _{i=1}^{m}w_{i,j}v_{i}\right)\,

和

\,P(v_{i}=1|h)=\sigma \left(a_{i}+\sum _{j=1}^{n}w_{i,j}h_{j}\right)

其中 $\sigma$ 代表邏輯函數。

與其他模型的關係

受限玻爾茲曼機是玻爾茲曼機和馬爾科夫隨機場的一種特例^[10]^[11]。這些概率圖模型可以對應到因子分析^[12]。

訓練算法

受限玻爾茲曼機的訓練目標是針對某一訓練集 $V$ ，最大化概率的乘積。其中， $V$ 被視為一矩陣，每個行向量作為一個可見單元向量解析失败 (SVG（MathML可通过浏览器插件启用）：从服务器“http://localhost:6011/zh.wikipedia.org/v1/”返回无效的响应（“Math extension cannot connect to Restbase.”）：): {\displaystyle v} ：

\arg \max _{W}\prod _{v\in V}P(v)

或者，等價地，最大化 $V$ 的對數概率期望：^[10]^[11]

\arg \max _{W}\mathbb {E} \left[\sum _{v\in V}\log P(v)\right]

訓練受限玻爾茲曼機，即最優化權重矩陣 $W$ ，最常用的算法是傑弗里·辛頓提出的對比分歧（contrastive divergence，CD）算法。這一算法最早被用於訓練辛頓提出的「專家積」模型^[13]。這一算法在梯度下降的過程中使用吉布斯採樣完成對權重的更新，與訓練前饋神經網絡中利用反向傳播算法類似。

基本的針對一個樣本的單步對比分歧（CD-1）步驟可被總結如下：

取一個訓練樣本 $v$ ，計算隱層節點的概率，在此基礎上從這一概率分布中獲取一個隱層節點激活向量的樣本 $h$ ；
計算 $v$ 和 $h$ 的外積，稱為「正梯度」；
從 $h$ 獲取一個重構的可見層節點的激活向量樣本 $v'$ ，此後從 $v'$ 再次獲得一個隱層節點的激活向量樣本 $h'$ ；
計算 $v'$ 和 $h'$ 的外積，稱為「負梯度」；
使用正梯度和負梯度的差以一定的學習率更新權重 $w_{i,j}$ ： $\Delta w_{i,j}=\epsilon (vh^{\mathsf {T}}-v'h'^{\mathsf {T}})$ 。

偏置 $a$ 和 $b$ 也可以使用類似的方法更新。

參見

參考資料

^ Smolensky, Paulgengxin. Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory. Rumelhart, David E.; McLelland, James L. (編). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations (PDF). MIT Press. 1986: 194–281 [2014-09-16]. ISBN 0-262-68053-X. （原始內容 (PDF)存檔於2013-06-13）.
^ G. E. Hinton, R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks. Science. 2006-07-28, 313 (5786): 504–507 [2018-04-02]. ISSN 0036-8075. doi:10.1126/science.1127647. （原始內容存檔於2018-03-20）（英語）.
^ Hugo Larochelle, Yoshua Bengio. Classification using discriminative restricted Boltzmann machines. ACM: 536–543. 2008-07-05 [2018-04-02]. ISBN 9781605582054. doi:10.1145/1390156.1390224.
^ Salakhutdinov, R.; Mnih, A.; Hinton, G. Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning - ICML '07: 791. 2007. ISBN 9781595937933. doi:10.1145/1273496.1273596.
^ Coates, Adam; Lee, Honglak; Ng, Andrew Y. An analysis of single-layer networks in unsupervised feature learning (PDF). International Conference on Artificial Intelligence and Statistics (AISTATS). 2011 [2014-09-16]. （原始內容 (PDF)存檔於2013-05-10）.
^ Ruslan Salakhutdinov and Geoffrey Hinton (2010). Replicated softmax: an undirected topic model （頁面存檔備份，存於網際網路檔案館）. Neural Information Processing Systems 23.
^ ^7.0 ^7.1 Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning. Artificial Intelligence and Statistics.
^ Hinton, G. Deep belief networks. Scholarpedia. 2009, 4 (5): 5947. Bibcode:2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947.
^ ^9.0 ^9.1 Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines （頁面存檔備份，存於網際網路檔案館）. UTML TR 2010–003, University of Toronto.
^ ^10.0 ^10.1 Sutskever, Ilya; Tieleman, Tijmen. On the convergence properties of contrastive divergence (PDF). Proc. 13th Int'l Conf. on AI and Statistics (AISTATS). 2010 [2014-09-16]. （原始內容 (PDF)存檔於2015-06-10）.
^ ^11.0 ^11.1 Asja Fischer and Christian Igel. Training Restricted Boltzmann Machines: An Introduction （頁面存檔備份，存於網際網路檔案館）. Pattern Recognition 47, pp. 25-39, 2014
^ María Angélica Cueto; Jason Morton; Bernd Sturmfels. Geometry of the restricted Boltzmann machine (PDF). Algebraic Methods in Statistics and Probability (American Mathematical Society). 2010, 516. arXiv:0908.4425 . ^{[永久失效連結]}
^ Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation. 2006-03-30, 14 (8): 1771–1800 [2018-04-02]. doi:10.1162/089976602760128018. （原始內容存檔於2018-11-27）（英語）.

外部連結

Introduction to Restricted Boltzmann Machines（頁面存檔備份，存於網際網路檔案館）. Edwin Chen's blog, July 18, 2011.

[1] Smolensky, Paulgengxin. Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory. Rumelhart, David E.; McLelland, James L. (編). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations (PDF). MIT Press. 1986: 194–281 [2014-09-16]. ISBN 0-262-68053-X. （原始內容 (PDF)存檔於2013-06-13）.

[2] G. E. Hinton, R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks. Science. 2006-07-28, 313 (5786): 504–507 [2018-04-02]. ISSN 0036-8075. doi:10.1126/science.1127647. （原始內容存檔於2018-03-20）（英語）.

[3] Hugo Larochelle, Yoshua Bengio. Classification using discriminative restricted Boltzmann machines. ACM: 536–543. 2008-07-05 [2018-04-02]. ISBN 9781605582054. doi:10.1145/1390156.1390224.

[softCF-4] Salakhutdinov, R.; Mnih, A.; Hinton, G. Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning - ICML '07: 791. 2007. ISBN 9781595937933. doi:10.1145/1273496.1273596.

[coates2011-5] Coates, Adam; Lee, Honglak; Ng, Andrew Y. An analysis of single-layer networks in unsupervised feature learning (PDF). International Conference on Artificial Intelligence and Statistics (AISTATS). 2011 [2014-09-16]. （原始內容 (PDF)存檔於2013-05-10）.

[6] Ruslan Salakhutdinov and Geoffrey Hinton (2010). Replicated softmax: an undirected topic model （頁面存檔備份，存於網際網路檔案館）. Neural Information Processing Systems 23.

[oncd-7] 7.0 ^7.1 Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning. Artificial Intelligence and Statistics.

[8] Hinton, G. Deep belief networks. Scholarpedia. 2009, 4 (5): 5947. Bibcode:2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947.

[guide-9] 9.0 ^9.1 Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines （頁面存檔備份，存於網際網路檔案館）. UTML TR 2010–003, University of Toronto.

[cdconvergence-10] 10.0 ^10.1 Sutskever, Ilya; Tieleman, Tijmen. On the convergence properties of contrastive divergence (PDF). Proc. 13th Int'l Conf. on AI and Statistics (AISTATS). 2010 [2014-09-16]. （原始內容 (PDF)存檔於2015-06-10）.

[RBMTutorial-11] 11.0 ^11.1 Asja Fischer and Christian Igel. Training Restricted Boltzmann Machines: An Introduction （頁面存檔備份，存於網際網路檔案館）. Pattern Recognition 47, pp. 25-39, 2014

[12] María Angélica Cueto; Jason Morton; Bernd Sturmfels. Geometry of the restricted Boltzmann machine (PDF). Algebraic Methods in Statistics and Probability (American Mathematical Society). 2010, 516. arXiv:0908.4425 . ^{[永久失效連結]}

[13] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation. 2006-03-30, 14 (8): 1771–1800 [2018-04-02]. doi:10.1162/089976602760128018. （原始內容存檔於2018-11-27）（英語）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]