perflab實驗報告材料_第1頁
perflab實驗報告材料_第2頁
perflab實驗報告材料_第3頁
已閱讀5頁,還剩8頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、HUNAN UNIVERSITY課程實驗報告課程名稱:計算機組成與結構實驗項目名稱: perflab-handout專業(yè)班級:姓名:學號:指導教師:科華宀兀成時間:2016 年 5 月 27日信息科學與工程學院實驗題目:perflab程序性能調優(yōu) 實驗目的:理解編譯器,學習程序優(yōu)化,從優(yōu)化程序代碼和程序執(zhí)行速度兩方面著 手。實驗要求:本次實驗,要求針對每個函數(shù)、每個人均至少寫出3種優(yōu)化版本、并根據(jù)driver報告的結果進行性能分析實驗環(huán)境:ubuntu-15.10、x32 系統(tǒng)、VMware workstation實驗容及操作步驟:將下載下來的 kernels.c中的rotate、smooth

2、函數(shù)進行優(yōu)化。rotate函數(shù)的作 用是將圖像逆時針旋轉90°, smooth函數(shù)的作用是對于圖像中的每一個像素點, 取 它和周圍的像素點的平均值,讓圖片變得模糊。下面對代碼進行逐一優(yōu)化。源代碼的CPE測試:Rotate: Versto仃 DtnYou r CPEs Baseline CPEsSpeedup=naive rotate: 643, fl14.7S + 012&4.140.19.7Naive2S67,54446.2baseline512 11*7 65 + 9 5.6i 們 plernentation: 1024IS-3夕4 于5.2MeanRotate: Ver

3、sion DtmYour CPE5Baseline CPE&Speedup=rotate:匚urrent working version:2 561235121024Mean7* 5 CJ "16 52 5* - 2 2 4 2 9 4Smooth: Verston DimYour CPEs Baseline CPEsSpeedupsmooth: Current working version:12&57, 3702012* 33257.2 695. & 1226457.3 69S.012.22S658*4 717.& 12*351261.2722.0

4、11.8Mean12.1Smooth: Version DtmYour CPEs Baseline CPEsSpeedup=natve3257269 5.912*2smooth:6457*2698.012.2Naive128 57* 1 702.612,3b-asetine25658.4717.012.3implenentatton:512Mean62.8 722.011.512.1Summary of Your Best Scores:Rotate: 6* 1 (ntve_rotate: Natve baseline inplenenttton)Smooth: 121 (snooth二 Cu

5、rrent working version)1. Naive_rotate1) 源代碼:char naive rotate descr= "naive rotate: Naive baseline implementation"void naive_rotate(int dim, pixel *src, pixel *dst)int i, j;for (i = 0; i < dim; i+)for (j = 0; j < dim; j+)dstRIDX(dim-1-j, i, dim) = srcRIDX(i, j, dim);2) 分析:這段代碼的作用就是用一

6、個雙層循環(huán)將所有的像素進行行列調位、導致整幅圖畫進 行了 90度旋轉。然而分析一下代碼就能發(fā)現(xiàn)一個十分簡單的優(yōu)化方法:因為在最 層循環(huán)中,j的值每次都會改變,所以每執(zhí)行一次賦值就要計算一次dim-1-j,算多了自然就慢了。我們可以利用簡單的數(shù)學技巧改寫公式,將賦值語句改成 dstRIDX(i, j, dim) = srcRIDX(j, dim-i-1, dim);這樣就不用每次都計算了。3) 優(yōu)化代碼1如下:char naive_rotate_descr2= "naive_rotate2:only change the place of i andj"void naive_

7、rotate2(int dim, pixel *src, pixel *dst)int i, j;for (i = 0; i < dim; i+)for (j = 0; j < dim; j+)dstRIDX(i, j, dim) = srcRIDX(j, dim-i-1, dim);/i change less優(yōu)化結果如下:Rotate: Ver&ton=natve_rotate:NaivebaselineimpleFientatton:64'1232565121024Mea nYour CPE55*39*212,719*SBaseline CPEs14.740.

8、146.465.994.5Speedup畀26,95,15*24*85*4Rotatei Version-naiverotated:onlychange the placeof I and jiDin64_1282565121024Your CPEsE.52.83.76.515.7Baseline CPEs14,746.146*465.994 t 5Speedup14.412.410A氣09.1這是一種最為簡單的優(yōu)化方案,由圖可知,速度提升不大,性能優(yōu)化結果也不是很好。再分析 源代碼,從cache友好性來分析,這個代碼的效率機會很低,所以按照cache的大小,應在存儲的時候進行 32個像素依次

9、存儲(列存儲)。做到cache友好這樣就可以可以大幅度提高效率。4) 優(yōu)化代碼2如下:char rotate_descr2 = "rotate2: versi on2break into 4*4 blocks"void rotate2(int dim, pixel *src, pixel*dst)int i, j,ii,jj;for(ii=0; ii < dim; ii+=4)for(jj=0; jj < dim; jj+=4)for(i=ii; i < ii+4; i+)for(j=jj; j < jj+4; j+)dstRIDX(dim-1-j,

10、i,dim) = srcRIDX(i,j,dim);優(yōu)化結果如下:Rot3Version= otate; Current working version:Dim641282565121024Mea nYour CPEs4-67.011.517.8Baseline CPEs14*740*146*465*994*5Speedup3-29.96.75.75.S5.8Rotate: Version=rotate2: version?break into4*4 blocks:Dim6412825 65121024MeanYour CPEs3,23.86.910.4Baseline CPEs14.740.

11、146.4&59鄉(xiāng)4.5Speedup4.510.58.859.18*2用分塊的方式,進行優(yōu)化。將整個程序分成4*4的小塊,提高空間局部性5) 優(yōu)化代碼3如下:char rotate_descr3 = "rotate3: version3 break into 32*32 blocks" void rotate3(int dim, pixel *src, pixel *dst)int i, j,ii,jj;for(ii=0; ii < dim; ii+=32)for(jj=0; jj < dim; jj+=32)for(i=ii; i < ii+3

12、2; i+) for(j=jj; j < jj+32; j+)dstRIDX(dim-1-j,i,dim) = srcRIDX(i,j,dim);優(yōu)化結果如下:Rotate: Version=rotate3: version3break Into32*32blocks:Din6412825峪5121024Your CPEs3.12 + 94.19.415.3Baseline CPEs14.740,146e465.994.5Speedup4.713.911.47.06.28.6分成32*32塊,提高空間局部性6) 優(yōu)化代碼4如下:char rotate_descr4 = "rot

13、ate4:Current working version,usingpointerrather tha n computi ng address"void rotate4(int dim, pixel *src, pixel *dst)int i;int j;int tmp1=dim*dim;int tmp2=dim *31;int tmp3=tmp1-dim;int tmp4=tmp1+32;int tmp5=dim+31;dst+=tmp3;for(i=0; i< dim; i+=32) for(j=0;j<dim;j+)*dst=*src; dst+;src+=di

14、m;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=di

15、m;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=di

16、m;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim;*dst=*src; dst+;src+=dim; *dst=*src; dst+;src+=dim;*dst=*src; src+; src-=tmp2;dst-=tmp5;src+=tmp2; dst+=tmp4;優(yōu)化結果如下:Rotate: Version=rotated: Current working version,usingpointer rather than conputin

17、g address:Din6412&2505121血4MeanYour CPEs2.32.42.44.47.9Baseline CPEs14.746.146.455.994,5Speedup&317.019,014.912.e12.9用循環(huán)展開,分成32路并行來寫2. Naive_smooth1) 源代碼char naive_smooth_descr = "naive_smooth: Naive baseline implementation" void naive_smooth(int dim, pixel *src, pixel *dst)int i,

18、j;for (i = 0; i < dim; i+)for (j = 0; j < dim; j+)dstRIDX(i, j, dim) = avg(dim, i, j, src);cpeS生能如下:Srooth: Version=smoothi: Current worktng version:Dim326412Q256E12Mea nYour CPEs57*957.85S.064.364.0Baseline CPEs65-S702.0717.0722.0Speedup1Z-1化111.111.311 -1Srooth: Version=naive_smooth:Naive ba

19、selineimplementation:Dim3264128256512MeanYour CPEs6S. iS7.857-861.964,8BseLtne CPEs695*0698.0762.8717722,0Speedup10.712.112.111.611.111,5Surnnary of YourBest Scores:Rotate: 12.4(rotate2:Current:working)version f usingpointer rting address)Smooth: 117(snooth:Currentworkingversion)2) 分析這段代碼很多次地調用 avg函

20、數(shù),而 avg函數(shù)也頻繁調用 initialize_pixel_sum、accumulate_sum、assign_sum_to_pixel 這幾個函數(shù),且含有2層for循環(huán)。雖然會以損害程序的模塊性為代價,但消除函數(shù)調用的時 間開銷,得到的代碼運行速度會快得多。所以,需要改寫代碼,不調用 avg函數(shù)。3) 優(yōu)化代碼1如下:char smooth_descr1 = "smoothl: with less func call and grossly simplified calculation for central parts"void smooth1(int dim, p

21、ixel *src, pixel *dst)int i, j, ii, jj;pixel_sum sum;pixel curre nt_pixel, cp;for (j = 0; j < dim; j+)dstRIDX(0, j, dim) = avg(dim, 0, j, src);dstRIDX(dim-1, j, dim) = avg(dim, dim-1, j, src);for (i = 0; i < dim; i+)dstRIDX(i, 0, dim) = avg(dim, i, 0, src);dstRIDX(i, dim-1, dim) = avg(dim, i,

22、dim-1, src);for (i = 1; i < dim-1; i+)for (j = 1; j < dim-1; j+)sum.red = sum.gree n = sum.blue = 0;for(ii = max(i-1,0); ii <= min (i+1, dim-1); ii+) for(jj = max(j-1,0); jj <= min(j+1, dim-1); jj+) cp=srcRIDX(ii,jj,dim);sum.red += cp.red;sum.gree n += cp.gree n; sum.blue += cp.blue;curr

23、e nt_pixel.red = sum.red/9;curre nt_pixel.gree n = sum.gree n/9;curre nt_pixel.blue = sum.blue/9; dstRIDX(i, j, dim) = curre nt_pixel;優(yōu)化結果如下:Smooth: Version=smooth: Current working version:Dim3264128256512MeanYour CPEs75*067 >466*667*971*6Baseline CPEs695.D698.0702.0717.0722.0Speedup9.316.410.S10

24、.610.110.1Smooth: Verston二 snoothl: withlessfunc call呂nd grossly simplified calculatton for central parts:Dim326412825&512leanYour CPEs49.047.346.849.350.0Baseline CPEs695698.0702.0717.0722.9Speedup14.214.815x014x514x414.64) 優(yōu)化代碼2如下:char smooth_descr2 = "smooth2: test vers ion"void smo

25、oth2(i nt dim, pixel *src, pixel *dst)int i,j;/no using avg()/cornersdstRIDX(0,0,dim).red=(srcRIDX(0,0,dim).red+srcRIDX(1,0,dim).red+sr cRIDX(0,1,dim).red+srcRIDX(1,1,dim).red)>>2;dstRIDX(0,0,dim).blue=(srcRIDX(0,0,dim).blue+srcRIDX(1,0,dim).blue +srcRIDX(0,1,dim).blue+srcRIDX(1,1,dim).blue)&g

26、t;>2;dstRIDX(0,0,dim).gree n=(srcRIDX(0,0,dim).gree n+srcRIDX(1,0,dim).gr ee n+srcRIDX(0,1,dim).gree n+srcRIDX(1,1,dim).gree n)>>2;dstRIDX(0,dim-1,dim).red=(srcRIDX(0,dim-1,dim).red+srcRIDX(1,dim-1, dim).red+srcRIDX(0,dim-2,dim).red+srcRIDX(1,dim-2,dim).red)>>2;dstRIDX(0,dim-1,dim).bl

27、ue=(srcRIDX(0,dim-1,dim).blue+srcRIDX(1,dim-1, dim).blue+srcRIDX(0,dim-2,dim).blue+srcRIDX(1,dim-2,dim).blue)>>2JdstRIDX(0,dim-1,dim).gree n=(srcRIDX(0,dim-1,dim).gree n+srcRIDX(1,di m-1,dim).gree n+srcRIDX(0,dim-2,dim).gree n+srcRIDX(1,dim-2,dim).gree n)>>2;dstRIDX(dim-1,0,dim).red=(src

28、RIDX(dim-1,0,dim).red+srcRIDX(dim-2,0, dim).red+srcRIDX(dim-1,1,dim).red+srcRIDX(dim-2,1,dim).red)>>2;dstRIDX(dim-1,0,dim).blue=(srcRIDX(dim-1,0,dim).blue+srcRIDX(dim-2, 0,dim).blue+srcRIDX(dim-1,1,dim).blue+srcRIDX(dim-2,1,dim).blue)>>2JdstRIDX(dim-1,0,dim).gree n=(srcRIDX(dim-1,0,dim).

29、gree n+srcRIDX(dim- 2,0,dim).gree n+srcRIDX(dim-1,1,dim).gree n+srcRIDX(dim-2,1,dim).gree n)>>2;dstRIDX(dim-1,dim-1,dim).red=(srcRIDX(dim-1,dim-1,dim).red+srcRIDX( dim-1,dim-2,dim).red+srcRIDX(dim-2,dim-1,dim).red+srcRIDX(dim-2,dim-2, dim).red)>>2;dstRIDX(dim-1,dim-1,dim).blue=(srcRIDX(d

30、im-1,dim-1,dim).blue+srcRID X(dim-1,dim-2,dim).blue+srcRIDX(dim-2,dim-1,dim).blue+srcRIDX(dim-2, dim-2,dim).blue)>>2;dstRIDX(dim-1,dim-1,dim).gree n=(srcRIDX(dim-1,dim-1,dim).gree n+srcR IDX(dim-1,dim-2,dim).gree n+srcRIDX(dim-2,dim-1,dim).gree n+srcRIDX(di m-2,dim-2,dim).gree n)>>2;/boa

31、rderfor(i=1;i<dim-1;i+) dstRIDX(i,0,dim).red=(srcRIDX(i,0,dim).red+srcRIDX(i-1,0,dim).red+ srcRIDX(i-1,1,dim).red+srcRIDX(i,1,dim).red+srcRIDX(i+1,0,dim).red +srcRIDX(i+1,1,dim).red)/6;dstRIDX(i,0,dim).blue=(srcRIDX(i,0,dim).blue+srcRIDX(i-1,0,dim).bl ue+srcRIDX(i-1,1,dim).blue+srcRIDX(i,1,dim).b

32、lue+srcRIDX(i+1,0,dim) .blue+srcRIDX(i+1,1,dim).blue)/6;dstRIDX(i,0,dim).gree n=(srcRIDX(i,0,dim).gree n+srcRIDX(i-1,0,dim). gree n+srcRIDX(i-1,1,dim).gree n+srcRIDX(i,1,dim).gree n+srcRIDX(i+1,0 ,dim).gree n+srcRIDX(i+1,1,dim).gree n)/6;for(i=1;i<dim-1;i+) dstRIDX(i,dim-1,dim).red=(srcRIDX(i,dim

33、-1,dim).red+srcRIDX(i-1,dim- 1,dim).red+srcRIDX(i-1,dim-2,dim).red+srcRIDX(i,dim-2,dim).red+src RIDX(i+1,dim-1,dim).red+srcRIDX(i+1,dim-2,dim).red)/6;dstRIDX(i,dim-1,dim).blue=(srcRIDX(i,dim-1,dim).blue+srcRIDX(i-1,di m-1,dim).blue+srcRIDX(i-1,dim-2,dim).blue+srcRIDX(i,dim-2,dim).blue +srcRIDX(i+1,d

34、im-1,dim).blue+srcRIDX(i+1,dim-2,dim).blue)/6;dstRIDX(i,dim-1,dim).gree n=(srcRIDX(i,dim-1,dim).gree n+srcRIDX(i-1, dim-1,dim).gree n+srcRIDX(i-1,dim-2,dim).gree n+srcRIDX(i,dim-2,dim). gree n+srcRIDX(i+1,dim-1,dim).gree n+srcRIDX(i+1,dim-2,dim).gree n)/6;for(j=1;j<dim-1;j+) dstRIDX(0,j,dim).red=

35、(srcRIDX(0,j,dim).red+srcRIDX(0,j-1,dim).red+ srcRIDX(1,j-1,dim).red+srcRIDX(1,j,dim).red+srcRIDX(0,j+1,dim).red +srcRIDX(1,j+1,dim).red)/6;dstRIDX(0,j,dim).blue=(srcRIDX(0,j,dim).blue+srcRIDX(0,j-1,dim).bl ue+srcRIDX(1,j-1,dim).blue+srcRIDX(1,j,dim).blue+srcRIDX(0,j+1,dim) .blue+srcRIDX(1,j+1,dim).

36、blue)/6;dstRIDX(0,j,dim).gree n=(srcRIDX(0,j,dim).gree n+srcRIDX(0,j-1,dim).gree n+srcRIDX(1,j-1,dim).gree n+srcRIDX(1,j,dim).gree n+srcRIDX(0,j+1 ,dim).gree n+srcRIDX(1,j+1,dim).gree n)/6;for(j=1;j<dim-1;j+)dstRIDX(dim-1,i,dim).red=(srcRIDX(dim-1,i,dim).red+srcRIDX(dim-1,i+1,dim).red+srcRIDX(dim

37、-1,j-1,dim).red+srcRIDX(dim-2,j,dim).red+srcRIDX(dim-2,j+1,dim).red+srcRIDX(dim-2,j-1,dim).red)/6;dstRIDX(dim-1,j,dim).blue=(srcRIDX(dim-1,j,dim).blue+srcRIDX(dim-1, j+1,dim).blue+srcRIDX(dim-1,j-1,dim).blue+srcRIDX(dim-2,j,dim).blue +srcRIDX(dim-2,j+1,dim).blue+srcRIDX(dim-2,j-1,dim).blue)/6;dstRID

38、X(dim-1,j,dim).gree n=(srcRIDX(dim-1,j,dim).gree n+srcRIDX(dim- 1,j+1,dim).gree n+srcRIDX(dim-1,j-1,dim).gree n+srcRIDX(dim-2,j,dim). gree n+srcRIDX(dim-2,j+1,dim).gree n+srcRIDX(dim-2,j-1,dim).gree n)/6;/commonfor(i=1;i<dim-1;i+) for(j=1;j<dim-1;j+)dstRIDX(i,j,dim).red=(srcRIDX(i,j,dim).red+s

39、rcRIDX(i+1,j,dim).red+srcRIDX(i-1,j,dim).red+srcRIDX(i,j-1,dim).red+srcRIDX(i+1,j-1,dim) .red+srcRIDX(i-1,j-1,dim).red+srcRIDX(i,j+1,dim).red+srcRIDX(i+1,j+ 1,dim).red+srcRIDX(i-1,j+1,dim).red)/9;dstRIDX(i,j,dim).blue=(srcRIDX(i,j,dim).blue+srcRIDX(i+1,j,dim).bl ue+srcRIDX(i-1,j,dim).blue+srcRIDX(i,j-1,dim).blue+srcRIDX(i+1,j-1, dim).blue+srcRIDX(i-1,j-1,dim).blue+srcRIDX(i,j+1,dim).blue+srcRID

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論