




已閱讀5頁,還剩14頁未讀, 繼續(xù)免費閱讀
版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
theano學習指南3 翻譯 多層感知器模型 本節(jié)要 用Theano實現(xiàn)的結(jié)構(gòu)是 一個隱層的多層感知器模型 MLP MLP可以看成 一種對數(shù)回歸器 其中輸 入通過 非線 性轉(zhuǎn)移矩陣 Phi 做 一個變換處理 以便于把輸 入數(shù)據(jù)投影到 一個線性可分的空間上 MLP的中間層 一般稱為隱層 單 一的 隱層便可以確保MLP全局近似 然 而 我們稍后還會看到多隱 層的好處 比如在深度學習中的應 用 本節(jié)只要介紹了MLP的實現(xiàn) 對神經(jīng) 網(wǎng)絡的背景知識介紹 不多 感興趣的朋友可以進 一步閱讀相應教程 譯者注 MLP模型 MLP模型可以 用以下的圖來表 示 單隱層的MLP定義了 一個映射 其中 D 和 L 為輸 入向量和輸出向量 f x 的 大 小 f x 的數(shù)學表達式為 其中 b 1 b 2 為偏差向量 W 1 W 2 為權重向量 G 和 s 為激活函 數(shù) 導航 博客園 首 頁 新隨筆 聯(lián)系 訂閱 管理 日 一 二三四五六 31123456 78910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 301234 5678910 11 公告 昵稱 xueliangliu 園齡 1年8個 月 粉絲 12 關注 0 加關注 搜索 找找看 谷歌搜索 常 用鏈接 我的隨筆 我的評論 我的參與 最新評論 我的標簽 我的標簽 theano 2 tutorial 1 貝葉斯 1 深度學習 1 deep learning 1 plsa 1 隨筆檔案 2013年6 月 1 2013年4 月 3 xueliangliu 隨筆 25 文章 0 評論 5 引 用 0 f RDRL f x G s x b 2 W 2 b 1 W 1 向量 h x Phi x s b 1 W 1 x 定義 了隱層 W 1 in R D times D h 為連接輸 入向 量和隱層的權重矩陣 其中每 一列表 示了輸 入神經(jīng)元和 一個隱 層神經(jīng)元權重 s 函數(shù)的經(jīng)典選擇包括 tanh tanh a e a e a e a e a 或者符號函數(shù) sigmod sigmoid a 1 1 e a 模型的輸出向量為 o x G b 2 W 2 h x 讀者應該記得 該形式在上 一節(jié)中 用過 和之前 一樣 如果把 G 定義為 softmax函數(shù) 輸出為類的歸屬概率 為了訓練MLP模型 我們 用隨機梯度下降算法學習所有參數(shù) 包括 theta W 2 b 2 W 1 b 1 梯度 partial ell partial theta 可以通過BP算法 backpropagation algorithm 計算 幸運的是 Theano可以 自動的計算差分 再次我們不需要操 心此細節(jié) 從對數(shù)回歸模型到多層感知器 本節(jié)我們專注于單層的MLP模型 在此 我們 首先實現(xiàn) 一個表 示隱層的類 為了構(gòu)建MLP模型 我們需要在此之上構(gòu)建 一個 對數(shù)回歸層 class HiddenLayer object def init self rng input n in n out activation T tanh Typical hidden layer of a MLP units are fully connected and have sigmoidal activation function Weight matrix W is of shape n in n out and the bias vector b is of shape n out NOTE The nonlinearity used here is tanh Hidden unit activation is given by tanh dot input W b type rng numpy random RandomState param rng a random number generator used to initialize weights type input theano tensor dmatrix param input a symbolic tensor of shape n examples n in 2012年9 月 1 2012年8 月 2 2012年6 月 1 2012年5 月 1 2012年4 月 1 2011年12 月 1 2011年11 月 1 2011年6 月 1 2011年3 月 8 2010年7 月 1 2009年12 月 1 2009年7 月 2 最新評論 1 Re 貝葉斯估計淺析 h arg max h P X h arg max h P X h P h arg max h exp sum i 1 N x i h 2 h 10 5 2 wrong 落 雨收衫 2 Re theano學習指南 2 翻譯 對數(shù)回歸分類器 TownHall Regression不是回歸嘛 油燜 大 黃 瓜 3 Re theano學習指南 2 翻譯 對數(shù)回歸分類器 寫的很好 拜讀 wusichen 閱讀排 行榜 1 theano學習指南1 翻 譯 7196 2 theano學習指南2 翻 譯 對數(shù)回歸分類器 2064 3 theano學習指南3 翻 譯 多層感知器模型 1568 4 貝葉斯估計淺析 1044 5 theano學習指南4 翻 譯 卷積神經(jīng) 網(wǎng)絡 887 評論排 行榜 1 theano學習指南2 翻 譯 對數(shù)回歸分類器 4 2 貝葉斯估計淺析 1 推薦排 行榜 1 theano學習指南1 翻 譯 4 2 theano學習指南3 翻 譯 多層感知器模型 1 type n in int param n in dimensionality of input type n out int param n out number of hidden units type activation theano Op or function param activation Non linearity to be applied in the hidden layer self input input 隱層權重的初始值需要從 一個和激活函數(shù)相關的對稱區(qū)間上 面 均勻采樣得到 對于tanh函數(shù) 采樣區(qū)間應該為 sqrt frac 6 fan in fan out sqrt frac 6 fan in fan out Xavier10 這 里 fan in 和 fan out 分別為第 i 1 和 i層的神經(jīng)元的 數(shù) 目 對于sigmoid函數(shù) 采樣區(qū)間為 4 sqrt frac 6 fan in fan out 4 sqrt frac 6 fan in fan out 初始化操作能夠保證在訓練的 前期 每個神經(jīng)元在激活函數(shù)的作 用下 信息可以更容易地向 下向上兩個 方向進 行傳播 W values numpy asarray rng uniform low numpy sqrt 6 n in n out high numpy sqrt 6 n in n out size n in n out dtype theano config floatX if activation theano tensor nnet sigmoid W values 4 self W theano shared value W values name W b values numpy zeros n out dtype theano config floatX self b theano shared value b values name b 3 theano學習指南2 翻 譯 對數(shù)回歸分類器 1 這 里我們要注意到 隱層的激活函數(shù)為 一個 非線性函數(shù) 函數(shù) 缺省為 tanh 但是很多情況下 我們可能 用下 面的函數(shù) self output activation T dot input self W self b parameters of the model self params self W self b 結(jié)合理論知識 這 里其實是計算了隱層的輸出 h x Phi x s b 1 W 1 x 如果你把這個值當 做LogisticRegression類的輸 入 正好是上節(jié)對數(shù)回歸分類的 內(nèi)容 而且此時的輸出正好是MLP的輸出 所以 一個MLP的簡 單實現(xiàn)如下 class MLP object Multi Layer Perceptron Class A multilayer perceptron is a feedforward artificial neural network model that has one layer or more of hidden units and nonlinear activations Intermediate layers usually have as activation function tanh or the sigmoid function defined here by a HiddenLayer class while the top layer is a softamx layer defined here by a LogisticRegression class def init self rng input n in n hidden n out Initialize the parameters for the multilayer perceptron type rng numpy random RandomState param rng a random number generator used to initialize weights type input theano tensor TensorType param input symbolic variable that describes the input of the architecture one minibatch type n in int param n in number of input units the dimension of the space in which the datapoints lie type n hidden int param n hidden number of hidden units type n out int param n out number of output units the dimension of the space in which the labels lie Since we are dealing with a one hidden layer MLP this will translate into a Hidden Layer connected to the LogisticRegression layer self hiddenLayer HiddenLayer rng rng input input n in n in n out n hidden activation T tanh The logistic regression layer gets as input the hidden units of the hidden layer self logRegressionLayer LogisticRegression input self hiddenLayer output n in n hidden n out n out 在本節(jié)中 我們?nèi)匀徊?用 L 1 和 L 2 規(guī)則化 因此需要計 算兩層的權重矩陣的規(guī)范化的結(jié)果 L1 norm one regularization option is to enforce L1 norm to be small self L1 abs self hiddenLayer W sum abs self logRegressionLayer W sum square of L2 norm one regularization option is to enforce square of L2 norm to be small self L2 sqr self hiddenLayer W 2 sum self logRegressionLayer W 2 sum negative log likelihood of the MLP is given by the negative log likelihood of the output of the model computed in the logistic regression layer self negative log likelihood self logRegressionLayer negative log likelihood same holds for the function computing the number of errors self errors self logRegressionLayer errors the parameters of the model are the parameters of the two layer it is made out of self params self hiddenLayer params self logRegressionLayer params 和之前 一樣 我們 用在mini batch上 面的隨機梯度下降算法訓 練模型 這 里的區(qū)別在于 我們修改損失函數(shù)并包括規(guī)范化 項 L1 reg 和 L2 reg 為超參數(shù) 用以控制規(guī)范化項在整個損 失函數(shù)中的 比重 計算損失的函數(shù)如下 the cost we minimize during training is the negative log likelihood of the model plus the regularization terms L1 and L2 cost is expressed here symbolically cost classifier negative log likelihood y L1 reg L1 L2 reg L2 sqr 接下來 模型參數(shù)通過梯度更新 這段代碼和之前的基本上 一 樣 除了參數(shù)多少的差別 compute the gradient of cost with respect to theta stored in params the resulting gradients will be stored in a list gparams gparams for param in classifier params gparam T grad cost param gparams append gparam specify how to update the parameters of the model as a list of variable update expression pairs updates given two list the zip A a1 a2 a3 a4 and B b1 b2 b3 b4 of same length zip generates a list C of same size where each element is a pair formed from the two lists C a1 b1 a2 b2 a3 b3 a4 b4 for param gparam in zip classifier params gparams updates append param param learning rate gparam compiling a Theano function train model that returns the cost butx in the same time updates the parameter of the model based on the rules defined in updates train model theano function inputs index outputs cost updates updates givens x train set x index batch size index 1 batch size y train set y index batch size index 1 batch size 功能綜合 基于以上基本概念 寫 一個MLP的類變成了 一件 非常容易的事 情 下 面的代碼演 示了它是如何運作的 其原理和我們之前的 對數(shù)回歸分類器基本 一致 This tutorial introduces the multilayer perceptron using Theano A multilayer perceptron is a logistic regressor where instead of feeding the input to the logistic regression you insert a intermediate layer called the hidden layer that has a nonlinear activation function usually tanh or sigmoid One can use many such hidden layers making the architecture deep The tutorial will also tackle the problem of MNIST digit classification math f x G b 2 W 2 s b 1 W 1 x References textbooks Pattern Recognition and Machine Learning Christopher M Bishop section 5 docformat restructedtext en import cPickle import gzip import os import sys import time import numpy import theano import theano tensor as T from logistic sgd import LogisticRegression load data class HiddenLayer object def init self rng input n in n out W None b None activation T tanh Typical hidden layer of a MLP units are fully connected and have sigmoidal activation function Weight matrix W is of shape n in n out and the bias vector b is of shape n out NOTE The nonlinearity used here is tanh Hidden unit activation is given by tanh dot input W b type rng numpy random RandomState param rng a random number generator used to initialize weights type input theano tensor dmatrix param input a symbolic tensor of shape n examples n in type n in int param n in dimensionality of input type n out int param n out number of hidden units type activation theano Op or function param activation Non linearity to be applied in the hidden layer self input input W is initialized with W values which is uniformely sampled from sqrt 6 n in n hidden and sqrt 6 n in n hidden for tanh activation function the output of uniform if converted using asarray to dtype theano config floatX so that the code is runable on GPU Note optimal initialization of weights is dependent on the activation function used among other things For example results presented in Xavier10 suggest that you should use 4 times larger initial weights for sigmoid compared to tanh We have no info for other function so we use the same as tanh if W is None W values numpy asarray rng uniform low numpy sqrt 6 n in n out high numpy sqrt 6 n in n out size n in n out dtype theano config floatX if activation theano tensor nnet sigmoid W values 4 W theano shared value W values name W borrow True if b is None b values numpy zeros n out dtype theano config floatX b theano shared value b values name b borrow True self W W self b b lin output T dot input self W self b self output lin output if activation is None else activation lin output parameters of the model self params self W self b class MLP object Multi Layer Perceptron Class A multilayer perceptron is a feedforward artificial neural network model that has one layer or more of hidden units and nonlinear activations Intermediate layers usually have as activation function thanh or the sigmoid function defined here by a SigmoidalLayer class while the top layer is a softamx layer defined here by a LogisticRegression class def init self rng input n in n hidden n out Initialize the parameters for the multilayer perceptron type rng numpy random RandomState param rng a random number generator used to initialize weights type input theano tensor TensorType param input symbolic variable that describes the input of the architecture one minibatch type n in int param n in number of input units the dimension of the space in which the datapoints lie type n hidden int param n hidden number of hidden units type n out int param n out number of output units the dimension of the space in which the labels lie Since we are dealing with a one hidden layer MLP this will translate into a TanhLayer connected to the LogisticRegression layer this can be replaced by a SigmoidalLayer or a layer implementing any other nonlinearity self hiddenLayer HiddenLayer rng rng input input n in n in n out n hidden activation T tanh The logistic regression layer gets as input the hidden units of the hidden layer self logRegressionLayer LogisticRegression input self hiddenLayer output n in n hidden n out n out L1 norm one regularization option is to enforce L1 norm to be small self L1 abs self hiddenLayer W sum abs self logRegressionLayer W sum square of L2 norm one regularization option is to enforce square of L2 norm to be small self L2 sqr self hiddenLayer W 2 sum self logRegressionLayer W 2 sum negative log likelihood of the MLP is given by the negative log likelihood of the output of the model computed in the logistic regression layer self negative log likelihood self logRegressionLayer negative log likelihood same holds for the function computing the number of errors self errors self logRegressionLayer errors the parameters of the model are the parameters of the two layer it is made out of self params self hiddenLayer params self logRegressionLayer params def test mlp learning rate 0 01 L1 reg 0 00 L2 reg 0 0001 n epochs 1000 dataset data mnist pkl gz batch size 20 n hidden 500 Demonstrate stochastic gradient descent optimization for a multilayer perceptron This is demonstrated on MNIST type learning rate float param learning rate learning rate used factor for the stochastic gradient type L1 reg float param L1 reg L1 norm s weight when added to the cost see regularization type L2 reg float param L2 reg L2 norm s weight when added to the cost see regularization type n epochs int param n epochs maximal number of epochs to run the optimizer type dataset string param dataset the path of the MNIST dataset file from http www iro umontreal ca lisa deep data mnist mnist pkl gz datasets load data dataset train set x train set y datasets 0 valid set x valid set y datasets 1 test set x test set y datasets 2 compute number of minibatches for training validation and testing n train batches train set x get value borrow True shape 0 batch size n valid batches valid set x get value borrow True shape 0 batch size n test batches test set x get value borrow True shape 0 batch size BUILD ACTUAL MODEL print building the model allocate symbolic variables for the data index T lscalar index to a mini batch x T matrix x the data is presented as rasterized images y T ivector y the labels are presented as 1D vector of int labels rng numpy random RandomState 1234 construct the MLP class classifier MLP rng rng input x n in 28 28 n hidden n hidden n out 10 the cost we minimize during training is the negative log likelihood of the model plus the regularization terms L1 and L2 cost is expressed here symbolically cost classifier negative log likelihood y L1 reg classifier L1 L2 reg classifier L2 sqr compiling a Theano function that computes the mistakes that are made by the model on a minibatch test model theano function inputs index outputs classifier errors y givens x test set x index batch size index 1 batch size y test set y index batch size index 1 batch size validate model theano function inputs index outputs classifier errors y givens x valid set x index batch size index 1 batch size y valid set y index batch size index 1 batch size compute the gradient of cost with respect to theta sotred in params the resulting gradients will be stored in a list gparams gparams for param in classifier params gparam T grad cost param gparams append gparam specify how to update the parameters of the model as a list of variable update expression pairs updates given two list the zip A a1 a2 a3 a4 and B b1 b2 b3 b4 of same length zip generates a list C of same size where each element is a pair formed from the two lists C a1 b1 a2 b2 a3 b3 a4 b4 for param gparam in zip classifier params gparams updates append param param learning rate gparam compiling a Theano function train model that returns the cost but in the same time updates the parameter of the model based on the rules defined in updates train model theano function inputs index outputs cost updates updates givens x train set x index batch size index 1 batch size y train set y index batch size index 1 batch size TRAIN MODEL print training early stopping parameters patience 10000 look as this many examples regardless patience increase 2 wait this much longer when a new best is found improvement threshold 0 995 a relative improvement of this much is considered significant validation frequency min n train batches patience 2 go through this many minibatche before checking the network on the validation set in this case we check every epoch best params None best validation loss numpy inf best iter 0 test score 0 start time time clock epoch 0 done looping False while epoch n epochs and
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 眼鏡行業(yè)視力問題免責協(xié)議
- 杭州市房屋裝修合同
- 掛靠物業(yè)公司合伙協(xié)議書
- 工程合同付款方式
- 泥工家裝裝修合同
- 租金遞增式商鋪租賃合同
- 垃圾焚燒發(fā)電項目投資合同
- 場地租賃協(xié)議注意事項
- 質(zhì)押擔保借款合同
- 優(yōu)化員工績效管理系統(tǒng)的具體實施方案
- 電鍍廢水中各種重金屬廢水處理反應原理及控制條件
- Q∕GDW 12118.1-2021 人工智能平臺架構(gòu)及技術要求 第1部分:總體架構(gòu)與技術要求
- 數(shù)據(jù)結(jié)構(gòu)英文教學課件:chapter3 Linked Lists
- 中建一局醫(yī)院直線加速器室專項施工方案
- 會計英語專業(yè)詞匯全
- 怎樣把握文章線索
- 青島版小學科學三年級下冊《太陽和影子》教學設計
- LED與金鹵燈對比(共4頁)
- 電梯質(zhì)量驗收記錄表
- 酒店熱水設計方案
- 融資擔保有限責任公司員工薪酬福利管理暫行辦法
評論
0/150
提交評論