全局布局優(yōu)化的一種增廣拉格朗日方法.doc

上傳人：伐*** IP屬地：寧夏上傳時(shí)間：2019-07-12 格式：DOC 頁(yè)數(shù)：12 大?。?68.50KB 積分：12 舉報(bào) 版權(quán)申訴

已閱讀5頁(yè)，還剩7頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

精品論文全局布局優(yōu)化的一種增廣拉格朗日方法李維國(guó)1, 陳建利2, 朱文興1,21 福州大學(xué) 離散數(shù)學(xué)與理論計(jì)算機(jī)研究中心，福州 3500022 福州大學(xué) 數(shù)學(xué)與計(jì)算機(jī)科學(xué)學(xué)院，福州 350108 摘要：如果忽略單元重疊，全局布局要計(jì)算出單元的最佳位置以最優(yōu)化特定度量標(biāo)準(zhǔn) (如, 線長(zhǎng), 密度溢出)。全局布局影響到一個(gè)電路芯片的可布線性, 性能, 能耗等，是大規(guī)模集成電路物理設(shè)計(jì)的一個(gè)關(guān)鍵環(huán)節(jié)。本文提出了一種增廣拉格朗日方法來(lái)優(yōu)化集成電路全局布局問(wèn)題。該方法采用了動(dòng)態(tài)的密度懲罰因子放大策略來(lái)平衡線長(zhǎng)與單元密度。將此方法與 ntuplace3 的全局布局框架結(jié)合，并在 ibm 的混合單元測(cè)試?yán)舆M(jìn)行試驗(yàn)。結(jié)果表明該方法能在合理時(shí)間內(nèi)得到質(zhì)量更好的布局。關(guān)鍵詞：非線性優(yōu)化；大規(guī)模集成電路；全局布局；增廣拉格朗日方法中圖分類號(hào)： tn47; tp301.6an augmented lagrangian method for vlsi global placement optimizationli wei-guo1, chen jian-li2, zhu wen-xing1,21 center for discrete mathematics and theoretical computer science, fuzhou university, fuzhou 3500022 college of mathematics and computer science, fuzhou university, fuzhou 350108abstract: ignoring some cell overlaps, global placement computes the best position for each cell to minimize some cost metric (e.g., total wirelength, density overow). it is a crucial step in very large scale integration(vlsi) physical design, since it aects routability, performance, and power consumption of a circuit. in this paper, we propose an augmented lagrangian method to solve the vlsi global placement. in this method, a cautious dynamic density weight increasing strategy is used to balance the wirelength and density constraint. we incorporated our method into ntuplace3s global placement framework, and tested it on the ibm mixed-size benchmark circuits. experimental results show that it obtains high-quality results in a reasonable running time.key words: nonlinear optimization; vlsi; global placement; augmented lagrangian method基金項(xiàng)目： national natural science foundation of china (61170308); national key basic research special foundation of china (2011cb808000)作者簡(jiǎn)介： li wei-guo（1987-），male，graduate student，major research direction：vlsi global placement algorithm.chen jian-li（1985-）, male，lecturer，major research direction：vlsi lorplanning/placement algorithm. correspondence author：zhu wen-xing（1968-），male，professor，major research direction：optimization theory and application, algorithm in vlsi physical design automation.- 5 -0 introductionthe vlsi placement problem involves placing a set of cells into a xed die for a given netlist, such that there are no overlaps among objects and some cost metric (e.g., wirelength, density overow) is optimized 1 . since the problem is np-complete 2, 3 , and designs with millions of cells are now common, it is a challenge to design ecient algorithms for producing high quality placement solutions 4 . since circuit performance heavily depends on placement results, the placement problem has attracted much attention recently, and many new academic placers were invented in recent years 516. those placers can be classied into three major categories 17 : meta-heuristic methods 5, 6, min-cut 79, and analytical algorithms 1016. among those methods, the analytical placers have shown their superior eciency and quality.the analytical approach placers consist of three major steps: global placement, legalization and detailed placement 1, 17, 18 . ignoring some circuit elements overlaps, global placement com- putes the best position for each circuit element. then, legalization removes all cell overlaps for each region. finally, detailed placement renes the placement solution 17 . global placement is generally considered as the most important step, due to its crucial impact on the overall placement quality 4, 19 . state-of-the-art algorithms for global placement form two families: (i) force-directed placers, such as kraftwerk2 10, fastplace3 11, rql 12 and simpl 13, and (ii) non-linear optimization techniques, such as aplace2 14, ntuplace3 15 and mpl6 16. algorithms in both categories are directly used in the industry or closely resemble those in industry placers 13 . tools based on non-linear optimization techniques achieve the best results reported for academic implementations 20 and eda vendor tools 13 . among these tools, n- tuplace3 15 obtained the best hpwl (half perimeter wirelength) in ispd (international symposium on physical design) 2006 placement contest 20, and took the 1st place at the dac (design automation conference) 2012 routability-driven placement contest 21.this paper is focused on nonlinear optimization in global placement which has been shown to be the most critical step in the placement ow. with the ntuplace3 15 placement model, we develop an augmented lagrangian method to solve the vlsi global placement optimization problem. the remainder of this paper is organized as follows. section 1 gives the problem statement and overviews the global placement method of ntuplace3. augmented lagrangian method and modications to ntuplace3 are explained in section 2. section 3 reports the experimental results of both methods. finally, the conclusions are given in section 4.1 preliminaries1.1 problem statementin the vlsi global placement problem, all cells in the chip are considered as vertices; all nets are considered as hyperedges 17 . given a design with n cells and m nets, the netlist can be formulated as a hypergraph h (v, e), where v is the set of vertices, v = v1 , v2 , . . . , vn, and e is the set of hyperedges, e = e1 , e2 , . . . , em. the coordinate (xi, yi) denotes the center point of the cell vi, and wi and hi denotes the width and height of the cell vi respectively. let ai be the area of the cell vi, i = 1, 2, , n. the design region is a rectangle denoted by (0, 0) and (w, h ). we intend to pack all the cells into the design region and determine their optimal positions so that the total wirelength is minimized and there is a little overlaps among cells 17 .to evenly distribute the blocks, the placement region was divided into uniform non- overlapping bin grids. then, the global placement problem can be formulated as a constrained minimization problem as follows 15, 17 :min w (x, y), s.t. db(x, y) mb, f or each bin b,(1)where the wirelength w (x, y) is dened as the total half perimeter wirelength (hpwl) 17,| |w (x, y) = ( maxvi ,vj exixj + maxvi ,vj e|yi yj |). (2)eedb(x, y) is the potential function that is the total area of movable blocks in bin b, and mb is the maximum allowable area of movable blocks in bin b. mb can be computed by mb = tdensity (wbhb pb), where tdensity is a user-specied target density value for each bin, wb(hb) is the width (height) of bin b, and pb is the base potential equal to the pre-placed block area in bin b. note that mb is a xed value as long as all pre-placed block positions are given and the bin size is determined 15, 17 .given v and e, the objective of the placement problem is to nd a placement such that the cost function w (x, y) is minimized and the constraints are satised.1.2 nonlinear method in ntuplace3s global placementthe global placement of ntuplace3 is based on the multilevel framework which applies a two-stage technique of bottom-up coarsening followed by top-down uncoarsening 15 . the coars- ening stage iteratively clusters blocks based on connectivity/block size to reduce the problem size until the problem size is below a given threshold. then, an initial placement is computed through quadratic programming. in the uncoarsening stage, it iteratively declusters the blocks and rene the block positions to reduce the wirelength. the declustering process continues untilthe nal placement is found. during the uncoarsening stage, ntuplace3 15 applies the ana- lytical model for the global placement. since the hpwl is not dierentiable, ntuplace3 15 uses the log-sum-exp wirelength w (x, y)lse 22, to smooth the hpwl wirelength function.circuit density is controlled mainly by cell spreading during global placement and cellsliding during detailed placement. the density function db(x, y) is dened asdb(x, y) = px(b, v)py (b, v), (3)vvwhere px and py are the overlap functions of bin b and block v along the x and y directions. since db(x, y) is neither smooth nor dierentiable, the bell-shaped potential function 14, 15 is used to smooth px and py . by doing so, the non-smooth function db(x, y) can be replaced by a smooth one,d b(x, y) = cv px(b, v)py (b, v), (4)vvwhere cv is a normalization factor so that the total potential of a block equals its area 15 .in ntuplace3 15, the quadratic penalty method is used to solve a sequence of uncon- strained minimization problems,min w (x, y)lse + (db(x, y) mb)2(5)bwith increasing s. note that w (x, y)lse and db(x, y) are smoothed wirelength and density functions respectively. the solution of the previous problem is used as the initial solution for the next one. ntuplace3 15 solve the unconstrained problems by a conjugate gradient (cg) algorithm. the step size in cg is computed by a novel ecient method. after computing the cg direction dk , the step size k is calculated by k = swb/|dk |2 , where s is a user-specied scaling factor, and wb is the bin width. by doing so, the step size of block spreading canbe controlled by s when the total quadratic euclidean movement is xed, vv (xi + yi ) =22i |k dk |2 = s2 w2 , where xand yare the amount of the movement along the x and y directions2biifor the block vi in each iteration. the value of s aects the precision of objective minimization;smaller s values lead to better results and longer runtime. in our implementation, we set s to0.2 as its the default value in ntuplace3(v12.03.26).2 proposed algorithmnon-convex constraint always limits the solution quality in optimization problems. to improve the quality of global placement, we need a better framework to balance the wirelength and non-convex density functions. in this section, we rst propose a new form of augment- ed lagrangian multiplier method which could degenerate to penalty method by a reductionparameter. second, a more cautious density penalty increasing strategy is dened, aims to balance the wirelength and density constraint.2.1 reduced-augmented lagrangian multiplier methodthe conjugate gradient method in ntuplace3 15 is very ecient. we rst implemented the following traditional augmented lagrangian multiplier(alm) algorithm,klse k bb b b01. solve min w (x, y) + (dm + k )2 ,02. k+1 = k + k (d b mb), b = 1, 2, 3, ., n2 ,bbb03. k+1 = k ,04. set k k + 1 and go to step 01.and used ntuplace3s cg method as subproblem solver. however, the result comes dramat- ically worse than ntuplace3s penalty method, especially on larger-size circuits. notice that conjugate gradient method in ntuplace3 uses a user-specied scaling factor s to control total quadratic euclidean movement, rather than make an exact line search. solutions of subprob- lems obtained in this way should not be or approximate enough to their stationary points. thus multipliers updated by these solutions could be misleading.to preserve the eciency of subproblem solver, we modied the above alm algorithm asfollows:klse k bb b b 01. solve min w (x, y) + (dm + k )2 ;02. k+1 = k + k (d b mb), b = 1, 2, 3, ., n2 ,bbb03. k+1 = k ,04. set k k + 1 and go to step 01.kthe dierence is in line 01 and line 02. the multiplier term in line 01 is replaced by b ,kand we adopt a damping factor in line 02. it is more stable than only , or is used. clearlythat if subproblems are solved exactly, then it is not necessary to dene at all, or equivalently, set to 1. thus the reduction factor is a subproblem solver precision dependence parameter. particularly, if tends to innite, this alm form degenerates back to the penalty method.2.2 dynamic density weight increasing strategyexperiments revealed that the penalty increasing factor, , plays an important role in solution quality and running time. among certain range, the algorithm tends to obtain better results with smaller , but consumes more running time. further, too small of also leads to quality reduction.we involved a cautious strategy for increasing the penalty parameter among subproblems. when the blocks are separate enough to some criterion, we believe that more optimization should be made in wirelength there after. at this situation, we adopt a minor increasing factor- 5 -1 , to pay more attention in wirelength optimization. if density overow is well decreased than last iteration, we might also use a increasing factor 2 , small enough, to slow down the penalty weight increasing rate, to prevent the function from dominated by the density term. otherwise, we set to a relatively larger one, 3 , to further distribute the cells. generally, weset 1 2 nmax )03. level+;04. hlevel = f irstchoiceclustering(hlevel1 );05. initialize block positions by solveqp (hlevel );06. for currentlevel = level to 007. initialize bin grid size nb = blockn umber(hcurrentlevel );08. initialize base potential for each bin; k = 0;09. initialize lagrangian multipliers k = 0, b = 1, 2, 3, ., n2 ;bb10. initialize last_overf low_ratio = overf low_ratio;- 6 -11. initialize 0 = min(max( w (x,y)b d b (x,y), min ), max );12.do k 2k13. solve min w (x, y)lse + k b (d b mb + b ) ;14. k+;15. k = k1 + k1 (d m ), b = 1, 2, 3, ., n2 ;bb bbb16. compute overf low_ratio;17. if (overf low_ratio 5%)18. k = 1 k1 ;19. else if (overf low_ratio 0.5 last_overf low_ratio)20. k = 2 k1 ;21. else22. k = 3 k1 ;23. last_overf low_ratio = overf low_ratio;24. if (currentlevel = 0 & overf low_ratio 10%)25. call lookaheadlegalization() and save the best result;26. until (spreading enough or no further reduction in overow ratio)27. if (currentlevel = 0)28. restore the best look-ahead result and return;29. else30. decluster and update block positions.圖 1: outline of our placement ow.3 implementation & experimentour global placement ow based on ntuplace3 15 is shown in fig.1. we set the parame- ters described above as follows: =0.85, =3.9, 1 =1.6, 2 =1.9, 3 =2.2. and the safeguardingparameters for initial density constraint weight (line 11 in fig.1) are min=2.0e-8, max=2.0e+8. these parameters are chosen by some experiments on ibm01, 03 and 05. for example, fig.2 shows the placement quality contour with respect to and . optimal results are located in dark blue regions. it shows that the algorithm obtains better results with moderate and ,and placements get worse when and both tend to 1.- 11 -5.554.543.532.521.5ibm010.20.40.60.815.554.543.532.521.5ibm030.20.40.60.815.554.543.532.521.5ibm050.20.40.60.811.11.081.061.041.0210.98圖 2: placement quality isoline with respect to and to fair compare our global placement optimization method to ntuplace3s 15, our global placement results were fed into ntuplace3s(v12.03.26) 23 detail placement algorithm. and nal results are compared to ntuplace3(v12.03.26). thus the detail placement algorithms are the same. we conducted experiments based on iccad04 mix-size placement benchmarks 24, and no parameter tuning to specic benchmarks was employed. all experiments were performed on the same linux pc with two intel i5-2410m 2.3ghz cpus and 2 gb memory.3.1 hpwl- aware placement comparisontable 1 shows the benchmark statistics and our placement results compared to the state- of-the-art 2d placer ntuplace3(v12.03.26). in the table, the column “di%” is the relative dierence of our results compared to ntuplace3, and “gp+dp cpu” represents the running time of global placement plus detail placement. the results show that our method obtains better hpwl on sixteen out of eighteen benchmarks than ntuplace3, with only 85% running time. our approach are 2.6% better in hpwl on average, and up to 4.3% better for 6 circuits.3.2 whitespace reservation comparisonthe most commonly used optimization objective in placement/lorplan is total wirelength. however, a placement with small wirelength may be unroutable due to high placement density in some subregions 4. density target is a constraint that forces a placer to preserve specied white space in any subregions of placement area. the density target is a oating number that is larger than or equal to chip density. for example, if density target is 0.7, any local region should be less than or equal to 70% occupied. placement that not meet this constraint would表 1: the hpwl (e5 ) and total runtime (seconds) comparisons with ntuplace3(v12.03.26)on iccad04 ibm mixed-size benchmarkscircuitcellmacrohpwl(e5)gp+dp cpu(sec)np3oursdi(%)np3oursratioibm01 ibm02 ibm03 ibm04 ibm05 ibm06 ibm07 ibm08 ibm09 ibm10 ibm11 ibm12 ibm13 ibm14 ibm15 ibm16 ibm17 ibm181226019071225632692528146321544534850722528576789969779697888328514647416079418252218

人人文庫(kù)> 全部分類> 專業(yè)文獻(xiàn) > 工程機(jī)械

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

全局布局優(yōu)化的一種增廣拉格朗日方法.doc

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

全局布局優(yōu)化的一種增廣拉格朗日方法.doc

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔