




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領
文檔簡介
1、Efficient and Accurate Approximations of Nonlinear Convolutional Networks高效率和準確的非線性的卷積神經(jīng)網(wǎng)絡逼近AbstractThis paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, ou
2、r method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint which helps to reduce the complexity of filters. We develop an effective solution to this constrained nonlinear optimization problem. An algorithm is als
3、o presented for reducing the accumulated error when multiple layers are approximated. A whole-model speedup ratio of 4X is demonstrated on a large network trained for ImageNet, while the top-5 error rate is only increased by 0.9%. Our accelerated model has a comparably fast speed as the AlexNet 11,
4、but is 4.7% more accurate.摘要:本文旨在提高深度卷積神經(jīng)網(wǎng)絡的計算測試時間 (CNNs)。與現(xiàn)有的近似線性濾波器或線性響應設計的方 法不同,該方法考慮了非線性單位。我們將重建非線性 響應的誤差降到最小,一個低等級的限制有助于減少過 濾器的復雜性。我們將非線性響應的重建誤差降到最小, 除有助于減少過濾器的復雜性的一個低等級的限制 們研制一個有效的解決這個約束非線性優(yōu)化的問題.為了減少多個圖層逼近時的累積誤差,提出了一個算法, 整個4 x的加速比模型論證了在大型ImageNet (圖像處理軟件)網(wǎng)絡訓練,即使 top-5 (五大低價主機排名) 的錯誤率也僅增加0.9 %我
5、們加速模型有一個比較快的 速度為AlexNet11,但4.7 %更準確。1. IntroductionThis paper addresses efficient test-time computation of deep convolutional neural networks (CNNs) 12, 11. Since the success of CNNs 11 for large-scale image classification, the accuracy of the newly developed CNNs 24, 17,8, 18, 19 has been continuou
6、sly improving. However, the computational cost of these networks (especially the more accurate but larger models) also increases significantly. The expensive test-time evaluation of the models can make them impractical in real-world systems. For example, a cloud service needs to process thousands of
7、 new requests per seconds; portable devices such as phones and tablets mostly have CPUs or low-end GPUs only; some recognition tasks like object detection 4, 8, 7 are still time-consuming for processing a single image even on a high-end GPU. For these reasons and others, it is of practical importanc
8、e to accelerate the test-time computation of CNNs.1.介紹這些論文金絲度卷積神經(jīng)網(wǎng)絡的高效的測試時 間計算12 , 11。自從大型圖像分類的深度卷積神經(jīng)網(wǎng) 絡11的成功,新開發(fā)的深度神經(jīng)網(wǎng)絡的準確性 24、17、 8、18、19不斷提高。然而,這些網(wǎng)絡的計算成本也顯 著增加(特別是更準確,但較大的模型)。評價測試模型 的昂貴性會使他們在現(xiàn)實世界系統(tǒng)不切實際的。例如, 云服務需要每秒處理成千上萬的新請求;便攜式設備, 如手機和平板電腦主多半只有cpu或低端GPUs 一些目標檢測的識別任務4、8、7仍然耗時處理單個圖像,即使是一臺高端 GPU對于這
9、些和其他原因,它具有加速 CNNJ#算測試時間的功能的實際意義。There have been a few studies on approximating deep CNNs for accelerating test-time evaluation 22, 3, 10. A commonly used assumption is that the convolutional filters are approximately low-rank along certain dimensions. So the original filters can be approximately dec
10、omposed into a series of smaller filters, and the complexity is reduced. These methods have shown promising speedup ratios on a single 3 or a few layers 10 with已經(jīng)有了一些為加速測試時間評估逼近深度神經(jīng)網(wǎng)絡的研究22、3、10. 一種常用的假設是,卷積濾波器 以近似低秩序沿著確定的維度。因此,初始過濾器可以 近似分解成一系列較小的過濾器、和降低復雜性。這些 方法在單層3或多層10與一些準確性的退化上已顯 示出很好地的加速比率some d
11、egradation of accuracy .The algorithms and approximations in the previous work are developed for reconstructing linear filters 3, 10 and linear responses 10. However, the nonlinearity like the Rectified Linear Units (ReLU) 14, 11 is not involved in their optimization. Ignoring the nonlinearity will
12、impact the quality of the approximated layers. Let us consider a case that the filters are approximated by reconstructing the linear responses. Because the ReLU will follow, the model accuracy is more sensitive to the reconstruction error of the positive responses than to that of the negative respon
13、ses.Moreover, it is a challenging task of accelerating the whole network (instead of just one or a very few layers).The errors will be accumulated if several layers are approximated, especially when the model is deep. Actually, in the recent work 3, 10 the approximations are applied on a single laye
14、r of large CNN models, such as those trained on ImageNet 2, 16. It is insufficient for practical usage to speedup one or a few layers, especially for the deeper models which have been shown veryaccurate 18, 19, 8.In this paper, a method for accelerating nonlinear convolutional networks is proposed.
15、It is based on minimizing the reconstruction error of nonlinear responses, subject to a low-rank constraint that can be used to reduce computation. To solve the challenging constrained optimization problem, we decompose it into two feasible subproblems and iteratively solve them. We further propose
16、to minimize an asymmetric reconstruction error, which effectively reduces the accumulated error of multipleapproximated layers.We evaluate our method on a 7-convolutional-layer model trained on ImageNet. We investigate the cases of accelerating each single layer and the whole model. Experiments show
17、 that our method is more accurate than the recent method of Jaderberg et al. s0 under the same speedup ratios. A whole-model speedup ratio of 4 x is demonstrated, and its degradation is merely 0.9%. When our model is accelerated to have a comparably fast speed as the“ Ale)(NeJ, our accuracy開發(fā)前期工作中的算
18、法和近似值是為了發(fā)展重建線性過濾器3 10和線性響應.然而,非線性像糾正線性單 位(關節(jié)軸承)14 11開發(fā)前期工作中的算法和近似值是為了發(fā)展重建線性過濾器3 10和線性響應.然而,非線性像糾正線性單 位(關節(jié)軸承)14 11沒有參與他們的優(yōu)化。 忽略非線性 將會影響近似層的質(zhì)量。讓我們考慮一個案例過濾器被 近似是通過重構(gòu)線性響應。由于ReLU將跟隨,該模型的準確性對重建積極反應的誤差更為敏感,而不是消極 的反應。止匕外,加速整個網(wǎng)絡是一個具有挑戰(zhàn)性的任務(而不 只是一個或少數(shù)幾個層),錯誤就會積聚如果逼近幾層, 尤其是當該模型很深。其實,在最近的工作3 10近似值是應用在單層大型 CNN1型
19、上,例如在ImageNet上的 實驗。這是加速一層或幾層不夠的實際情況,特別是對于更深層次的模型已被證明非常準確的18、19、8。本文提出了一種加速非線性卷積神經(jīng)網(wǎng)絡的方 法。它是基于最小化重構(gòu)非線性響應的誤差,但可以用 來減少計算的一個低階約束。為解決約束優(yōu)化問題,我 們分解成兩種可行的子問題,并以迭代方式解決它們。 我們進一步建議盡量減少非對稱重構(gòu)誤差,有效的減少 了多個近似層的累積誤差。評估我們的方法是在一個7-卷積層ImageNet訓練模型。我們研究加快每個單層和整個模型的情況。實驗結(jié)果表明,該方法具有較高的精度比最近的Jaderberg等人的方法10在相同的加速比,t#況下。整個 4
20、 X的加速比模型表明,其錯誤率僅為0.9 %。在我們的模型加速,有一個比較快的速度就像在AlexNet11,我們精確度高4.7 %。2.1響應的低階逼近我們的觀察是,響應在卷積特征映射的位置大約位 于低秩子空間。低階分解可以減少復雜性。尋找近似低 階子空間,我們降低重建誤差的響應。更正式地說,我們考慮一個過濾器大小的kxkxc的卷積層,其中k2.1響應的低階逼近我們的觀察是,響應在卷積特征映射的位置大約位 于低秩子空間。低階分解可以減少復雜性。尋找近似低 階子空間,我們降低重建誤差的響應。更正式地說,我們考慮一個過濾器大小的kxkxc的卷積層,其中k是空間濾波器尺寸和 c是這一層的輸入通 道的
21、數(shù)量。為計算響應,此過濾器應用在kx kxc層的輸入量。我們使用|x2表示一個向量重塑這個卷 (附加一個偏見作為最后的條目)。相對面積相應y工那 在特征映射的位置計算:y - Wx.I)其中W是一個|山姆tFc+L矩陣,d是過濾器的數(shù)量。W勺每一行代表重塑kxkxc濾波器的形式(附加偏倚作為最后的條目)。我們將在稍后解決非線性情況 。If the vector y is on a low-rank subspace, we can write: M T - 3 - where M is a d-by-d matrix of a rank h ;1and is themean vector of
22、 responses. Expanding this equation,we can compute a response by:y = MWx + bt(2)where 二=$ a new bias.Thel副工工后matrix M can be decomposed into two 出個y matrices P and Q such that U ic volume of the layer input. We useto denote a vector that reshapes thisvolume (appending one as the last entry for the b
23、ias). A response a 二 處at a position of a feature map is computed as:y = Wx.(1)where W is a ,: i.: An original IayejFigure I . Illustration of the ippiuxuicilloti. (a An original IayejJ with complexity 口(d此亡).(b) An approximated layer with complexity reduced co O(drkc)+Note that the decomposition of,
24、=E。 can be arbitrary. It does not impact the value of y computed in Eqn.(3). A simple decomposition is the Singular Vector Decomposition (SVD) 5:Note that the decomposition of,=E。 can be arbitrary. It does not impact the value of y computed in Eqn.(3). A simple decomposition is the Singular Vector D
25、ecomposition (SVD) 5:and/are 卜氏 :I column-orthogonalmatrices andis a 盤:入 4 diagonal matrix. Then we can需要注意的是分解 聞二FQ-可以隨心所欲。這不影 響Eqn(3).中計算y的值。一個簡單的分解是奇異向量 分解(SVD)5:同二口0廣小 ,其中匚小和是列正交矩陣并且 覬是一個對角線矩 陣。然后我們可以得到P = u闋?和q = %嗎obtainobtainIn practice the and the computationapproximate low-rank problem:low-
26、rank assumption is an approximation, in Eqn.(3) is approximate. To find an subspace, we optimize the following在實踐中低階的假設是一個近似值并且Eqn.(3)In practice the and the computationapproximate low-rank problem:low-rank assumption is an approximation, in Eqn.(3) is approximate. To find an subspace, we optimize t
27、he following在實踐中低階的假設是一個近似值并且Eqn.(3)計算是近似的。找到一個低階近似子空間,我們優(yōu)化了以下 問題:UJin2 lltv. - yi -川 y - 可喙t一Hereis a response sampled from the feature maps inthe training set. This problem can be solved by SVD 5 or actually Principal Component Analysis (PCA): let Y be the d-by-nmatrix concatenating n responses wi
28、th the mean subtracted, compute the eigen-decomposition of the covariance matrixwhT = SiJiwhere U is an orthogonal matrix and S is diagonal,andHereis a response sampled from the feature maps inthe training set. This problem can be solved by SVD 5 or actually Principal Component Analysis (PCA): let Y
29、 be the d-by-nmatrix concatenating n responses with the mean subtracted, compute the eigen-decomposition of the covariance matrixwhT = SiJiwhere U is an orthogonal matrix and S is diagonal,andwhereare the first Heigenvectors. With theSVD可以解決這一問題5或者實際上主成成分分析 (PCA):使丫為d-by-n矩陣連接n響應與平均減去,計 算協(xié)方差矩陣的特征分解
30、丫丁:1:譏其中u是正交 矩陣和S是對角線,并且M = L山中的I w是第一個特征向量。以矩陣M的計算,我們可以發(fā)現(xiàn)matrix Mmatrix MHow good is the low-rank assumption of the responses?We sample the responses from a CNN model (with 7 convolutional layers, detailed in Sec. 3) trained on ImageNet 2. For the responses of a convolutional layer (from 3,000 rand
31、omly sampled training images), we compute the eigenvalues of their covariance matrix and then plot the sum of the largest eigenvalues (Fig. 2). We see that substantial energy is in a small portion of the largest eigenvectors. For example, in the Conv2 layer (d = 256) the first 128 eigenvectors contr
32、ibute over 99.9% energy; in the Conv7 layer (d = 512), the first 256 eigenvectors contribute over 95% energy. This indicates that we can use a fraction of the filters to precisely approximate the original filters.響應的低階假設有多好?我們?nèi)踊貞獜腃NN1型(7卷積層,詳細Sec. 3 )在ImageNet訓練2。對于卷積層的反應(從3000隨機抽取的訓練圖像),我們 計算其協(xié)方差矩
33、陣的特征值,然后繪制最大特征值的總How good is the low-rank assumption of the responses?We sample the responses from a CNN model (with 7 convolutional layers, detailed in Sec. 3) trained on ImageNet 2. For the responses of a convolutional layer (from 3,000 randomly sampled training images), we compute the eigenvalues
34、 of their covariance matrix and then plot the sum of the largest eigenvalues (Fig. 2). We see that substantial energy is in a small portion of the largest eigenvectors. For example, in the Conv2 layer (d = 256) the first 128 eigenvectors contribute over 99.9% energy; in the Conv7 layer (d = 512), th
35、e first 256 eigenvectors contribute over 95% energy. This indicates that we can use a fraction of the filters to precisely approximate the original filters.響應的低階假設有多好?我們?nèi)踊貞獜腃NN1型(7卷積層,詳細Sec. 3 )在ImageNet訓練2。對于卷積層的反應(從3000隨機抽取的訓練圖像),我們 計算其協(xié)方差矩陣的特征值,然后繪制最大特征值的總和(圖.2)。我們看到大量的能量是在最大特征向量的一 小部分.例如,在Conv2
36、層(d=256)前128個特征向量的 貢獻超過99.9 %勺能源;在 Conv7層(d=512),前256個 特征向量的貢獻超過95%勺能源。這表明我們可以使用一個過濾器的一小部分精確近似原始過濾器。The low-rank behavior of the responses y is because of the low-rank behaviors of the filters W and the inputs x . While the low-rank assumptions of filters have been adopted in recent work 3, 10, we f
37、urther adopt the lowrank assumptions of the filter input x, which is a local volume and should have correlations. The responses y will have lower rank than W and x, so the approximation can be more precise. In our optimization (4), we directly address the low-rank subspace of y.The Nonlinear CaseNex
38、t we investigate the case of using nonlinear units. We use r( ) -to denote the nonlinear operator. In this paper we focus on the Rectified Linear Unit (ReLU) 14: 1” ,山. nonlinearresponse is given by r(Wx)or simply r(y). We minimize the reconstruction error of the nonlinear responses叫|rg) - r(My,+ b)
39、*(5)i低階行為反應的y是由于低的行為過濾器 W和投入低階行為反應的y是由于低的行為過濾器 W和投入x。 而過濾器的低階假設采用在了最近的工作中3 , 10,我們進一步采用低階假設過濾器的輸入x,這是一個局部卷積,應該有關聯(lián)。反應 y會比WW x級別低,因此逼 近可以更精確。在我們的優(yōu)化中(4),我們直接解決低秩 子空間y。2.2.非線性接下來,我們利用非線性單位研究這種情況。我們用 r( )表示的非線性算子。在本文 中,我們側(cè)重于 (ReLU)144=1岡,。非線性響應由(Wk)和簡 單的r(y).給出了。我們將非線性響應的重建誤差降到 最?。? 1 s./. rctriAlM) + 入忸
40、-(My* + b)的3 I.-u琉 M) + 入忸-(My* + b)的3 I.-u琉 M) df.Here 眄G is a set of auxiliary variables of the same size as .is a penalty parameter. If入 the solution to (6) will convergeto the solution to (5) 23. We adopt an alternating solver, fixingand solving for M, b and vice versa (i) The subproblem of M, b
41、. In this case,are fixed. It is easy3.1. ri才決(hi) 甫.(6)其中1)是一組與有同樣大小的輔助變量。入是懲罰參數(shù)。如果入-8, (6)解決方案將趨同于解決 (5)23。我們采用交替求解,確定 用)并且求解現(xiàn)值M,b,(i)反之亦然。子問題M, b.在這種情況下,麻是固定的。很容易證明F)其中?是&的樣本平均值。to showis the sample mean of zi.代以b為目標函數(shù),獲得了包含M在內(nèi)的問題:Substituting b into the objective function, we obtain the problemin
42、volving M:71.involving M:71.乩h raj次M1 .(/.使Z是d-n連接向量矩陣連接向量 任I耳o我們重寫上 述問題:.哂|憶目 .哂|憶目 加一其中II-心是弗羅貝尼烏斯范數(shù)。 這一優(yōu)化問題是一個降Here is the Frobenius norm. This optimization problem is a秩回歸問題6, 21, 20,它可以通過一種廣義奇異向量分解來解決6,21, 20。解決辦法如下。讓Let Z be the d-by-n matrix concatenating the vectors of zi - z We rewrite the
43、above problem as:-打丁丁L GSVD勺應用N1以至于U作為一個d-by- d單位矩陣滿足匚丁1 = L其中The GSVD is applied onSias,such that U is a d-by-d orthogonal matrix satisfyingwhere Id is a d-by-d identitysatisfying-7-打丁丁L GSVD勺應用N1以至于U作為一個d-by- d單位矩陣滿足匚丁1 = L其中The GSVD is applied onSias,such that U is a d-by-d orthogonal matrix sati
44、sfyingwhere Id is a d-by-d identitysatisfying-7廠 (calledmatrix, and V is a d-by-d matrix generalized orthogonality). Thenthe solution M to (8) is given bywhere 正andare the first 鼠 columns of U and V and arethelargest singular values. We can further show that if Z = Y (so the problem in (7) becomes (
45、4), this solution degrades computing the eigen-decomposition of ,。(ii) The subproblem of zi. In this case, M and b are fixed. Thentoin this subproblem each elementV of each vector ziisL是一個d-by- d單位矩陣,并且V是一個d-by- d單位矩陣滿足”丫丫丁1 =b| (稱為廣義正交性)。那么解決方案(8)中的M由下式給出M-V/其中U4和是第一個 小行的U和V并且5是最大的d單數(shù)。我們 可以進一步發(fā)現(xiàn)如果Z
46、 - Y (因此問題 變成(4),這種解決方案分解計算特征值分解 YY o(ii) 子問題zi.在這種情況下,M和b是固定的然后在這個子問題中每個元素ZtJ的每個向量勺都是獨立于任何其他。所以我們解決一個 1維優(yōu)化問題如下:Reduced Rank Regression problem 6, 21, 20, and it can be solved by a kind of Generalized Singular Vector Decomposition (GSVD) 6, 21, 20. The solution is as follows. Letindependent of any o
47、ther. So we soke a 1-dimensionaloptimization problem as follows:nm (r(yij) - r(zij)2 + Xzij - 產(chǎn),where is the j-th entry of h.We can separatelyconsider 跖I and 卜and remove the ReLU operator.吧1k(州)面-4涔其中作為4打的第j個輸入。我們可以單獨 考慮司之;并且% f并且刪除ReLU操作。然后我們可以Then we can derive the solution as follows: let= n)in(0
48、,-小1/+*奶)-1 j = TTlHXfIJ. l . , in-duthen zij = zij if zij gives a smaller value in (9) than zij otherwise zij = zij,andAlthough we focus on the ReLU, our method is applicable forother types of nonlinearities. The subproblem in (9) is a 1-dimensional nonlinear least squares problem, so can be solved
49、 by gradient descent or simply line search. We plan to study this issue in the future.We alternatively solve (i) and (ii). The initialization isgiven by the solution to the linear case (4). We warm up the solverby setting the penalty parameterThen we increase the value of入=0.01 and run 25 iterations
50、.入.In staulyl, be gradually得到如下解決方案:(IU)uh之后如果叮給出了一個較小的值在(9)比u,此外盡管我們關注的焦點在 ReLU上,我們的方法適用于 其他類型的非線性。(9)式的子問題是一個1維非線性最 小二乘問題,所以可以通過梯度下降或簡單的線性搜索 來解決。我們計劃在未來研究這一問題。increased to infinity 23. But wefind that it is difficult for theiterative solver to make progress if入 is too large. So we increase我們選擇處理(i
51、)和(ii)。初始化給出了線性情況下的解 決方案(4)。我們初始化設置懲罰參數(shù) A - 0.01求解 并且迭代25次。然后我們增加以的值。理論上,以應逐 步增加到無窮23 o但我們發(fā)現(xiàn)迭代求解器很難取得進 展如果人太大。因此我們增加到1,迭代次數(shù)大于25, 并且使用結(jié)果M作為我們的解決方案。然后我們計算P和Q用SVD過ML2.3多層非對稱重建為了加快整個網(wǎng)絡,我們在每一層按順序應用上述 方法,從淺層到深層。如果上一層近似,當?shù)诙訒r逼 近時,誤差可以累積。我們提出了一種非對稱重建方法 來解決這個問題Let us consider a layer whose input feature map
52、is not precise due to the approximation of the previous layer/layers. Wedenote the approximate input to the current layer as 忖For thetraining samples, we can still compute its non-approximate讓我們考慮一個圖層它的輸入特征的地圖不精確,這是 因為前一圖層/圖層的逼近。我們通常把近似輸入到當前 層為交。對訓練樣本,我們還可以計算其非近似響應 2.3多層非對稱重建為了加快整個網(wǎng)絡,我們在每一層按順序應用上述 方
53、法,從淺層到深層。如果上一層近似,當?shù)诙訒r逼 近時,誤差可以累積。我們提出了一種非對稱重建方法 來解決這個問題Let us consider a layer whose input feature map is not precise due to the approximation of the previous layer/layers. Wedenote the approximate input to the current layer as 忖For thetraining samples, we can still compute its non-approximate讓我們考慮
54、一個圖層它的輸入特征的地圖不精確,這是 因為前一圖層/圖層的逼近。我們通常把近似輸入到當前 層為交。對訓練樣本,我們還可以計算其非近似響應 y =Wx所以我們可以優(yōu)化不又稱版本(5):responses as y = Wx. So we can optimize an of (5):. _ ”asymmetricversionminr(Wxa) r(MWip +(12)Mrh if.to 1, run 25 more iterations, and use the resulting M as our solution. Then we compute P and Q by SVD on M
55、.Asymmetric Reconstruction for Multi-LayerTo accelerate a whole network, we apply the above method sequentially on each layer, from the shallow layers to the deeper ones. If a previous layer is approximated, its error can be accumulated when the next layer is approximated. We propose an asymmetric r
56、econstruction method to address this issue在第一項中在第一項中是非近似輸入,而在第二項輸入uin|r(WxJ r(XlVx3 + b) 112.i 是由于上一層的近似輸入。我們不需要使用 心在第一 項,因為“vv血)是真正的原始網(wǎng)絡的成果,因此更精確。 在另一方面,我們不使用匕在第是由于上一層的近似輸入。我們不需要使用 心在第一 項,因為“vv血)是真正的原始網(wǎng)絡的成果,因此更精確。 在另一方面,我們不使用匕在第二項,因為常4川 是近似層的實際運作。當逼近多層時,這種不對稱的版本可以減少累積誤差。優(yōu)化問題在(12)可以使用相同的算法(5)解決。is2.
57、4為整個模型的加速度選擇秩在上面的,每一層優(yōu)化是基于目標 dfo d(是唯一的 參數(shù),它決定加速層的復雜性。但鑒于所需的整個模型 的加速比,我們需要確定適當?shù)牡燃塪用于每一層。我們的戰(zhàn)略是基于實證觀察 PCA能源與分類準確性有關 聯(lián)在之后近似中。要驗證這一看法,在圖.3我們顯示了分類準確性(表示為任何近似的差異)VS PCA能源。這圖中的每個點使用的 d值是實證評估。100崛旨源意味 著沒有逼近,因此也沒有退化分類準確性。3表明,分類準確性大約是線性 PCA1旨源。Figure 3. PCA ill muu.Lache energy iid the jccuriicy raid l.top-
58、5). Here Ik giecyracy is mllgicd using the Iinan MuH/n (ihc HMlmcar mHuiiun has a 方iiailar LFund.h Edch layer 透 cvaluiied irfedepeiukiitly. with ulliei lliyer& noL jppiuiLiiidkiJ. Tlie acviuac) i y 寫 Iww n 琳 the diff&grig na no 期中圖 miuilwn.同時為每一層確定數(shù)組的秩,我們進一步假設,整個模型的分類準確率大約是與產(chǎn)品的所有圖層的PCA能源有關。更正式地說,我們認
59、為這一目標函數(shù):I n三I其中是層l的a倍一最大特征值,工3內(nèi)向是Here in the first term 3 is the non-approximate input, while in Athe second term x* i is the approximate input due to the previous layer. We need not use kWin the first term, becauseisthe real outcome of the original network and thus is more precise. On the other han
60、d, we do not use 蛛 in the second term, because is the actual operation of the approximated layer.This asymmetric version can reduce the accumulative errors when multiple layers are approximated. The optimization problem in (12) can be solved using the same algorithm as for (5)Rank Selection for Whol
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 供水公司合同范本
- 出租小型家具合同范本
- 賣毛渣合同范本
- 醫(yī)院工作合同范本
- 出口寄售合同范本
- 單位建房工程合同范例
- 協(xié)商還款新合同范本
- 出租柴油儲罐合同范本
- 保險 入職合同范本
- 付款欠款合同范本
- 小學生防性侵安全教育主題班會課件
- 幸福心理學智慧樹知到答案2024年浙江大學
- 人教版一年級數(shù)學下冊教案全冊(完整版下載打印)
- 2024至2030年全球及中國消費電子磁阻隨機存取存儲器(MRAM)行業(yè)深度研究報告
- 聯(lián)合體施工雙方安全生產(chǎn)協(xié)議書范本
- 云南省2023年秋季學期期末普通高中學業(yè)水平考試信息技術(含答案解析)
- 氣血津液(中醫(yī)理論)
- 2024年2型糖尿病中醫(yī)防治指南解讀課件
- 2024-2030年中國螺旋藻行業(yè)市場發(fā)展分析及發(fā)展趨勢與投資研究報告
- MOOC 中外鐵路文化之旅-華東交通大學 中國大學慕課答案
- CJJ 82-2012 園林綠化工程施工及驗收規(guī)范
評論
0/150
提交評論