theano學習指南3(翻譯)-多層感知器模型 - xueliangliu - 博客.pdf

上傳人：a*** IP屬地：河南上傳時間：2020-01-30 格式：PDF 頁數(shù)：19 大?。?44.18KB 積分：18 舉報 版權申訴

theano學習指南3(翻譯)-多層感知器模型 - xueliangliu - 博客.pdf_第2頁

theano學習指南3(翻譯)-多層感知器模型 - xueliangliu - 博客.pdf_第3頁

theano學習指南3(翻譯)-多層感知器模型 - xueliangliu - 博客.pdf_第4頁

theano學習指南3(翻譯)-多層感知器模型 - xueliangliu - 博客.pdf_第5頁

已閱讀5頁，還剩14頁未讀，繼續(xù)免費閱讀

版權說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權，請進行舉報或認領

文檔簡介

theano學習指南3 翻譯多層感知器模型本節(jié)要用Theano實現(xiàn)的結(jié)構(gòu)是一個隱層的多層感知器模型 MLP MLP可以看成一種對數(shù)回歸器其中輸入通過非線性轉(zhuǎn)移矩陣 Phi 做一個變換處理以便于把輸入數(shù)據(jù)投影到一個線性可分的空間上 MLP的中間層一般稱為隱層單一的隱層便可以確保MLP全局近似然而我們稍后還會看到多隱層的好處比如在深度學習中的應用本節(jié)只要介紹了MLP的實現(xiàn) 對神經(jīng) 網(wǎng)絡的背景知識介紹不多感興趣的朋友可以進一步閱讀相應教程譯者注 MLP模型 MLP模型可以用以下的圖來表示單隱層的MLP定義了一個映射其中 D 和 L 為輸入向量和輸出向量 f x 的大小 f x 的數(shù)學表達式為其中 b 1 b 2 為偏差向量 W 1 W 2 為權重向量 G 和 s 為激活函數(shù) 導航博客園首頁新隨筆聯(lián)系訂閱管理日一二三四五六 31123456 78910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 301234 5678910 11 公告昵稱 xueliangliu 園齡 1年8個月粉絲 12 關注 0 加關注搜索找找看谷歌搜索常用鏈接我的隨筆我的評論我的參與最新評論我的標簽我的標簽 theano 2 tutorial 1 貝葉斯 1 深度學習 1 deep learning 1 plsa 1 隨筆檔案 2013年6 月 1 2013年4 月 3 xueliangliu 隨筆 25 文章 0 評論 5 引用 0 f RDRL f x G s x b 2 W 2 b 1 W 1 向量 h x Phi x s b 1 W 1 x 定義了隱層 W 1 in R D times D h 為連接輸入向量和隱層的權重矩陣其中每一列表示了輸入神經(jīng)元和一個隱層神經(jīng)元權重 s 函數(shù)的經(jīng)典選擇包括 tanh tanh a e a e a e a e a 或者符號函數(shù) sigmod sigmoid a 1 1 e a 模型的輸出向量為 o x G b 2 W 2 h x 讀者應該記得該形式在上一節(jié)中用過和之前一樣如果把 G 定義為 softmax函數(shù) 輸出為類的歸屬概率為了訓練MLP模型我們用隨機梯度下降算法學習所有參數(shù) 包括 theta W 2 b 2 W 1 b 1 梯度 partial ell partial theta 可以通過BP算法 backpropagation algorithm 計算幸運的是 Theano可以自動的計算差分再次我們不需要操心此細節(jié) 從對數(shù)回歸模型到多層感知器本節(jié)我們專注于單層的MLP模型在此我們首先實現(xiàn) 一個表示隱層的類為了構(gòu)建MLP模型我們需要在此之上構(gòu)建一個對數(shù)回歸層 class HiddenLayer object def init self rng input n in n out activation T tanh Typical hidden layer of a MLP units are fully connected and have sigmoidal activation function Weight matrix W is of shape n in n out and the bias vector b is of shape n out NOTE The nonlinearity used here is tanh Hidden unit activation is given by tanh dot input W b type rng numpy random RandomState param rng a random number generator used to initialize weights type input theano tensor dmatrix param input a symbolic tensor of shape n examples n in 2012年9 月 1 2012年8 月 2 2012年6 月 1 2012年5 月 1 2012年4 月 1 2011年12 月 1 2011年11 月 1 2011年6 月 1 2011年3 月 8 2010年7 月 1 2009年12 月 1 2009年7 月 2 最新評論 1 Re 貝葉斯估計淺析 h arg max h P X h arg max h P X h P h arg max h exp sum i 1 N x i h 2 h 10 5 2 wrong 落雨收衫 2 Re theano學習指南 2 翻譯對數(shù)回歸分類器 TownHall Regression不是回歸嘛油燜大黃瓜 3 Re theano學習指南 2 翻譯對數(shù)回歸分類器寫的很好拜讀 wusichen 閱讀排行榜 1 theano學習指南1 翻譯 7196 2 theano學習指南2 翻譯對數(shù)回歸分類器 2064 3 theano學習指南3 翻譯多層感知器模型 1568 4 貝葉斯估計淺析 1044 5 theano學習指南4 翻譯卷積神經(jīng) 網(wǎng)絡 887 評論排行榜 1 theano學習指南2 翻譯對數(shù)回歸分類器 4 2 貝葉斯估計淺析 1 推薦排行榜 1 theano學習指南1 翻譯 4 2 theano學習指南3 翻譯多層感知器模型 1 type n in int param n in dimensionality of input type n out int param n out number of hidden units type activation theano Op or function param activation Non linearity to be applied in the hidden layer self input input 隱層權重的初始值需要從一個和激活函數(shù)相關的對稱區(qū)間上面均勻采樣得到對于tanh函數(shù) 采樣區(qū)間應該為 sqrt frac 6 fan in fan out sqrt frac 6 fan in fan out Xavier10 這里 fan in 和 fan out 分別為第 i 1 和 i層的神經(jīng)元的數(shù) 目對于sigmoid函數(shù) 采樣區(qū)間為 4 sqrt frac 6 fan in fan out 4 sqrt frac 6 fan in fan out 初始化操作能夠保證在訓練的前期每個神經(jīng)元在激活函數(shù)的作用下信息可以更容易地向下向上兩個方向進行傳播 W values numpy asarray rng uniform low numpy sqrt 6 n in n out high numpy sqrt 6 n in n out size n in n out dtype theano config floatX if activation theano tensor nnet sigmoid W values 4 self W theano shared value W values name W b values numpy zeros n out dtype theano config floatX self b theano shared value b values name b 3 theano學習指南2 翻譯對數(shù)回歸分類器 1 這里我們要注意到隱層的激活函數(shù)為一個非線性函數(shù) 函數(shù) 缺省為 tanh 但是很多情況下我們可能用下面的函數(shù) self output activation T dot input self W self b parameters of the model self params self W self b 結(jié)合理論知識這里其實是計算了隱層的輸出 h x Phi x s b 1 W 1 x 如果你把這個值當做LogisticRegression類的輸入正好是上節(jié)對數(shù)回歸分類的內(nèi)容而且此時的輸出正好是MLP的輸出所以一個MLP的簡單實現(xiàn)如下 class MLP object Multi Layer Perceptron Class A multilayer perceptron is a feedforward artificial neural network model that has one layer or more of hidden units and nonlinear activations Intermediate layers usually have as activation function tanh or the sigmoid function defined here by a HiddenLayer class while the top layer is a softamx layer defined here by a LogisticRegression class def init self rng input n in n hidden n out Initialize the parameters for the multilayer perceptron type rng numpy random RandomState param rng a random number generator used to initialize weights type input theano tensor TensorType param input symbolic variable that describes the input of the architecture one minibatch type n in int param n in number of input units the dimension of the space in which the datapoints lie type n hidden int param n hidden number of hidden units type n out int param n out number of output units the dimension of the space in which the labels lie Since we are dealing with a one hidden layer MLP this will translate into a Hidden Layer connected to the LogisticRegression layer self hiddenLayer HiddenLayer rng rng input input n in n in n out n hidden activation T tanh The logistic regression layer gets as input the hidden units of the hidden layer self logRegressionLayer LogisticRegression input self hiddenLayer output n in n hidden n out n out 在本節(jié)中我們?nèi)匀徊?用 L 1 和 L 2 規(guī)則化因此需要計算兩層的權重矩陣的規(guī)范化的結(jié)果 L1 norm one regularization option is to enforce L1 norm to be small self L1 abs self hiddenLayer W sum abs self logRegressionLayer W sum square of L2 norm one regularization option is to enforce square of L2 norm to be small self L2 sqr self hiddenLayer W 2 sum self logRegressionLayer W 2 sum negative log likelihood of the MLP is given by the negative log likelihood of the output of the model computed in the logistic regression layer self negative log likelihood self logRegressionLayer negative log likelihood same holds for the function computing the number of errors self errors self logRegressionLayer errors the parameters of the model are the parameters of the two layer it is made out of self params self hiddenLayer params self logRegressionLayer params 和之前一樣我們用在mini batch上面的隨機梯度下降算法訓練模型這里的區(qū)別在于我們修改損失函數(shù)并包括規(guī)范化項 L1 reg 和 L2 reg 為超參數(shù) 用以控制規(guī)范化項在整個損失函數(shù)中的比重計算損失的函數(shù)如下 the cost we minimize during training is the negative log likelihood of the model plus the regularization terms L1 and L2 cost is expressed here symbolically cost classifier negative log likelihood y L1 reg L1 L2 reg L2 sqr 接下來模型參數(shù)通過梯度更新這段代碼和之前的基本上一樣除了參數(shù)多少的差別 compute the gradient of cost with respect to theta stored in params the resulting gradients will be stored in a list gparams gparams for param in classifier params gparam T grad cost param gparams append gparam specify how to update the parameters of the model as a list of variable update expression pairs updates given two list the zip A a1 a2 a3 a4 and B b1 b2 b3 b4 of same length zip generates a list C of same size where each element is a pair formed from the two lists C a1 b1 a2 b2 a3 b3 a4 b4 for param gparam in zip classifier params gparams updates append param param learning rate gparam compiling a Theano function train model that returns the cost butx in the same time updates the parameter of the model based on the rules defined in updates train model theano function inputs index outputs cost updates updates givens x train set x index batch size index 1 batch size y train set y index batch size index 1 batch size 功能綜合基于以上基本概念寫一個MLP的類變成了一件非常容易的事情下面的代碼演示了它是如何運作的其原理和我們之前的對數(shù)回歸分類器基本一致 This tutorial introduces the multilayer perceptron using Theano A multilayer perceptron is a logistic regressor where instead of feeding the input to the logistic regression you insert a intermediate layer called the hidden layer that has a nonlinear activation function usually tanh or sigmoid One can use many such hidden layers making the architecture deep The tutorial will also tackle the problem of MNIST digit classification math f x G b 2 W 2 s b 1 W 1 x References textbooks Pattern Recognition and Machine Learning Christopher M Bishop section 5 docformat restructedtext en import cPickle import gzip import os import sys import time import numpy import theano import theano tensor as T from logistic sgd import LogisticRegression load data class HiddenLayer object def init self rng input n in n out W None b None activation T tanh Typical hidden layer of a MLP units are fully connected and have sigmoidal activation function Weight matrix W is of shape n in n out and the bias vector b is of shape n out NOTE The nonlinearity used here is tanh Hidden unit activation is given by tanh dot input W b type rng numpy random RandomState param rng a random number generator used to initialize weights type input theano tensor dmatrix param input a symbolic tensor of shape n examples n in type n in int param n in dimensionality of input type n out int param n out number of hidden units type activation theano Op or function param activation Non linearity to be applied in the hidden layer self input input W is initialized with W values which is uniformely sampled from sqrt 6 n in n hidden and sqrt 6 n in n hidden for tanh activation function the output of uniform if converted using asarray to dtype theano config floatX so that the code is runable on GPU Note optimal initialization of weights is dependent on the activation function used among other things For example results presented in Xavier10 suggest that you should use 4 times larger initial weights for sigmoid compared to tanh We have no info for other function so we use the same as tanh if W is None W values numpy asarray rng uniform low numpy sqrt 6 n in n out high numpy sqrt 6 n in n out size n in n out dtype theano config floatX if activation theano tensor nnet sigmoid W values 4 W theano shared value W values name W borrow True if b is None b values numpy zeros n out dtype theano config floatX b theano shared value b values name b borrow True self W W self b b lin output T dot input self W self b self output lin output if activation is None else activation lin output parameters of the model self params self W self b class MLP object Multi Layer Perceptron Class A multilayer perceptron is a feedforward artificial neural network model that has one layer or more of hidden units and nonlinear activations Intermediate layers usually have as activation function thanh or the sigmoid function defined here by a SigmoidalLayer class while the top layer is a softamx layer defined here by a LogisticRegression class def init self rng input n in n hidden n out Initialize the parameters for the multilayer perceptron type rng numpy random RandomState param rng a random number generator used to initialize weights type input theano tensor TensorType param input symbolic variable that describes the input of the architecture one minibatch type n in int param n in number of input units the dimension of the space in which the datapoints lie type n hidden int param n hidden number of hidden units type n out int param n out number of output units the dimension of the space in which the labels lie Since we are dealing with a one hidden layer MLP this will translate into a TanhLayer connected to the LogisticRegression layer this can be replaced by a SigmoidalLayer or a layer implementing any other nonlinearity self hiddenLayer HiddenLayer rng rng input input n in n in n out n hidden activation T tanh The logistic regression layer gets as input the hidden units of the hidden layer self logRegressionLayer LogisticRegression input self hiddenLayer output n in n hidden n out n out L1 norm one regularization option is to enforce L1 norm to be small self L1 abs self hiddenLayer W sum abs self logRegressionLayer W sum square of L2 norm one regularization option is to enforce square of L2 norm to be small self L2 sqr self hiddenLayer W 2 sum self logRegressionLayer W 2 sum negative log likelihood of the MLP is given by the negative log likelihood of the output of the model computed in the logistic regression layer self negative log likelihood self logRegressionLayer negative log likelihood same holds for the function computing the number of errors self errors self logRegressionLayer errors the parameters of the model are the parameters of the two layer it is made out of self params self hiddenLayer params self logRegressionLayer params def test mlp learning rate 0 01 L1 reg 0 00 L2 reg 0 0001 n epochs 1000 dataset data mnist pkl gz batch size 20 n hidden 500 Demonstrate stochastic gradient descent optimization for a multilayer perceptron This is demonstrated on MNIST type learning rate float param learning rate learning rate used factor for the stochastic gradient type L1 reg float param L1 reg L1 norm s weight when added to the cost see regularization type L2 reg float param L2 reg L2 norm s weight when added to the cost see regularization type n epochs int param n epochs maximal number of epochs to run the optimizer type dataset string param dataset the path of the MNIST dataset file from http www iro umontreal ca lisa deep data mnist mnist pkl gz datasets load data dataset train set x train set y datasets 0 valid set x valid set y datasets 1 test set x test set y datasets 2 compute number of minibatches for training validation and testing n train batches train set x get value borrow True shape 0 batch size n valid batches valid set x get value borrow True shape 0 batch size n test batches test set x get value borrow True shape 0 batch size BUILD ACTUAL MODEL print building the model allocate symbolic variables for the data index T lscalar index to a mini batch x T matrix x the data is presented as rasterized images y T ivector y the labels are presented as 1D vector of int labels rng numpy random RandomState 1234 construct the MLP class classifier MLP rng rng input x n in 28 28 n hidden n hidden n out 10 the cost we minimize during training is the negative log likelihood of the model plus the regularization terms L1 and L2 cost is expressed here symbolically cost classifier negative log likelihood y L1 reg classifier L1 L2 reg classifier L2 sqr compiling a Theano function that computes the mistakes that are made by the model on a minibatch test model theano function inputs index outputs classifier errors y givens x test set x index batch size index 1 batch size y test set y index batch size index 1 batch size validate model theano function inputs index outputs classifier errors y givens x valid set x index batch size index 1 batch size y valid set y index batch size index 1 batch size compute the gradient of cost with respect to theta sotred in params the resulting gradients will be stored in a list gparams gparams for param in classifier params gparam T grad cost param gparams append gparam specify how to update the parameters of the model as a list of variable update expression pairs updates given two list the zip A a1 a2 a3 a4 and B b1 b2 b3 b4 of same length zip generates a list C of same size where each element is a pair formed from the two lists C a1 b1 a2 b2 a3 b3 a4 b4 for param gparam in zip classifier params gparams updates append param param learning rate gparam compiling a Theano function train model that returns the cost but in the same time updates the parameter of the model based on the rules defined in updates train model theano function inputs index outputs cost updates updates givens x train set x index batch size index 1 batch size y train set y index batch size index 1 batch size TRAIN MODEL print training early stopping parameters patience 10000 look as this many examples regardless patience increase 2 wait this much longer when a new best is found improvement threshold 0 995 a relative improvement of this much is considered significant validation frequency min n train batches patience 2 go through this many minibatche before checking the network on the validation set in this case we check every epoch best params None best validation loss numpy inf best iter 0 test score 0 start time time clock epoch 0 done looping False while epoch n epochs and

人人文庫> 全部分類> 應用文書 > 產(chǎn)品手冊

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責。
6. 下載文件中如有侵權或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

theano學習指南3(翻譯)-多層感知器模型 - xueliangliu - 博客.pdf

文檔簡介

溫馨提示

最新文檔

評論

相關文檔