并行程序設(shè)計 中文課件 09 并行程序通信開銷_第1頁
并行程序設(shè)計 中文課件 09 并行程序通信開銷_第2頁
并行程序設(shè)計 中文課件 09 并行程序通信開銷_第3頁
并行程序設(shè)計 中文課件 09 并行程序通信開銷_第4頁
并行程序設(shè)計 中文課件 09 并行程序通信開銷_第5頁
已閱讀5頁,還剩63頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

ParallelProgrammingInstructor:ZhangWeizhe(張偉哲)ComputerNetworkandInformationSecurityTechniqueResearchCenter,SchoolofComputerScienceandTechnology,HarbinInstituteofTechnologyCommunicationCostsinParallelComputers31.CommunicationCostModel2.

CommunicationCostsofDifferentTopology

2.1Point-to-PointCommunication2.2One-to-AllBroadcast&All-to-OneReduction2.3All-to-AllBroadcastandReduction

2.4ThePrefix-SumOperation2.5ScatterandGather2.6All-to-AllPersonalizedCommunication2.7CircularShift3.

SummaryOutline41.溝通成本模型2.不同拓?fù)涞耐ㄐ懦杀?.1點對點通信2.2一對多廣播和多對一減少2.3全方位廣播與減少2.4前綴和操作2.5散點和收集2.6全對所有個性化溝通2.7循環(huán)3.總結(jié)OutlineCommunicationCostsAlongwithidlingandcontention,communicationisamajoroverheadinparallelprograms.Thecostofcommunicationisdependentonavarietyoffeaturesincludingtheprogrammingmodelsemantics,thenetworktopology,datahandlingandrouting,andassociatedsoftwareprotocols.5通信開銷隨著空轉(zhuǎn)和爭用,通信是并行程序中的一大開銷。通信的成本取決于各種功能,包括編程模型語義,網(wǎng)絡(luò)拓?fù)?,?shù)據(jù)處理和路由以及相關(guān)的軟件協(xié)議。6Thetotaltimetotransferamessageoveranetworkcomprisesofthefollowing:Startuptime(ts):Timespentatsendingandreceivingnodes(executingtheroutingalgorithm,programmingrouters,etc.).Per-hoptime(th):Thistimeisafunctionofnumberofhopsandincludesfactorssuchasswitchlatencies,networkdelays,etc.Per-wordtransfertime

(tw):Thistimeincludesalloverheadsthataredeterminedbythelengthofthemessage.Thisincludesbandwidthoflinks,errorcheckingandcorrection,etc.MessagePassingCosts7通過網(wǎng)絡(luò)傳輸消息的總時間包括以下內(nèi)容:啟動時間(ts):發(fā)送和接收節(jié)點所花費的時間(執(zhí)行路由算法,編程路由器等)。每跳時間(th):此時間是跳數(shù)的函數(shù),包括切換延遲,網(wǎng)絡(luò)延遲等因素。每字傳輸時間(tw):此時間包括由消息長度確定的所有開銷。這包括鏈路的帶寬,錯誤檢查和糾正等。消息傳遞成本8Store-and-ForwardRoutingAmessagetraversingmultiplehopsiscompletelyreceivedatanintermediatehopbeforebeingforwardedtothenexthop.ThetotalcommunicationcostforamessageofsizemwordstotraverselcommunicationlinksisInmostplatforms,thissmallandtheaboveexpressioncanbeapproximatedby

9存儲轉(zhuǎn)發(fā)路由在被轉(zhuǎn)發(fā)到下一個跳轉(zhuǎn)之前,在一個中間的跳轉(zhuǎn)中,一個消息遍歷多個跳轉(zhuǎn)。一個長度為m字的消息通過l的通信鏈接的總通信成本是在大多數(shù)平臺中,th是很小的,上面的表達(dá)式可以被近似10Cut-ThroughRoutingStore-and-forwardmakespooruseofcommunicationresources.Packetroutingbreaksmessagesintopacketsandpipelinesthemthroughthenetwork.Atracermessagefirstprogramsallintermediaterouters.Allflitsthentakethesameroute.Thetotalcommunicationtimeforcut-throughroutingisapproximatedby:11直通路由存儲轉(zhuǎn)發(fā)不利于通信資源。分組路由將消息分解成分組,并通過網(wǎng)絡(luò)將它們進(jìn)行管道傳輸。示蹤消息首先對所有中間路由器進(jìn)行編程。所有的fl子然后采取相同的路線。直通路由的總通信時間近似為:12SimplifiedCostModelThecostofcommunicatingamessagebetweentwonodeslhopsawayusingcut-throughroutingisgivenbyInthisexpression,thistypicallysmallerthantsandtw.Forthisreason,thesecondtermintheRHSdoesnotshow,particularly,whenmislarge.Forthesereasons,wecanapproximatethecostofmessagetransferby13簡化成本模型在兩個節(jié)點之間傳遞消息的成本l由跳過路由跳過在這個表達(dá)式中,th通常小于ts和tw。因此,RHS中的第二項并不顯示,特別是當(dāng)m大時。由于這些原因,我們可以近似消息傳輸?shù)某杀?4151.CommunicationCostModel2.CommunicationCostsofDifferentTopology

2.1Point-to-PointCommunication2.2One-to-AllBroadcast&All-to-OneReduction2.3All-to-AllBroadcastandReduction

2.4ThePrefix-SumOperation2.5ScatterandGather2.6All-to-AllPersonalizedCommunication2.7CircularShift3.

SummaryOutline

DataCommunicationbetweenTwoNetworkProcessors

16

拓?fù)銼tore-and-ForwardRouting

Cutting-ThroughRingts+mtwGrid—torusts+mtwHypercubets+mtwBasicCommunicationOperations:IntroductionGroupcommunicationoperationsarebuiltusingpoint-to-pointmessagingprimitives.Recallfromourdiscussionofarchitecturesthatcommunicatingamessageofsizemoveranuncongestednetworktakestimets+tmw.Weusethisasthebasisforouranalyses.Wherenecessary,wetakecongestionintoaccountexplicitlybyscalingthetwterm.Weassumethatthenetworkisbidirectionalandthatcommunicationissingle-ported.17基本溝通操作:介紹組通信操作是使用點對點消息傳遞原語構(gòu)建的。從我們對架構(gòu)的討論回想一下,在一個不成功的網(wǎng)絡(luò)上傳送大小為m的消息需要時間ts+tmw。我們用它作為我們分析的基礎(chǔ)。如有必要,我們通過縮放tw項目來明確擁擠。我們假設(shè)網(wǎng)絡(luò)是雙向的,通信是單端口的。18One-to-AllBroadcastandAll-to-OneReductionOneprocessorhasapieceofdata(ofsizem)itneedstosendtoeveryone.Thedualofone-to-allbroadcastisall-to-onereduction.Inall-to-onereduction,eachprocessorhasmunitsofdata.Thesedataitemsmustbecombinedpiece-wise(usingsomeassociativeoperator,suchasadditionormin),andtheresultmadeavailableatatargetprocessor19One-to-AllBroadcastandAll-to-OneReduction一個處理器有一個數(shù)據(jù)(大小m)需要發(fā)送給每個人。一對多廣播的雙重縮減是多對一的。在多對一的減少中,每個處理器都有m個數(shù)據(jù)單元。這些數(shù)據(jù)項必須以分段方式組合(使用一些關(guān)聯(lián)運算符,例如加法或最?。?,并將結(jié)果提供給目標(biāo)處理器20Simplestwayistosendp-1messagesfromthesourcetotheotherp-1processors-thisisnotveryefficient.Userecursivedoubling:sourcesendsamessagetoaselectedprocessor.Wenowhavetwoindependentproblemsderinedoverhalvesofmachines.Reductioncanbeperformedinanidenticalfashionbyinvertingtheprocess.21最簡單的方法是將p-1消息從源發(fā)送到其他p-1處理器-這不是非常有效。使用遞歸加倍:源向所選處理器發(fā)送消息。我們現(xiàn)在有兩個獨立的問題是機(jī)器的一半??梢酝ㄟ^反轉(zhuǎn)該過程以相同的方式執(zhí)行減少。22One-to-AllBroadcast23

All-to-OneReductionReductiononaneight-noderingwithnode

0asthedestinationofthereduction.One-to-AllBroadcast

25One-to-AllBroadcast26

All-to-AllBroadcastandReductionGeneralizationofbroadcastinwhicheachprocessoristhesourceaswellasdestination.廣播的廣播,其中每個處理器是源和目的地。Aprocesssendsthesamem-wordmessagetoeveryotherprocess,butdifferentprocessesmaybroadcastdifferentmessages.一個進(jìn)程向每個其他進(jìn)程發(fā)送相同的m-word消息,但是不同的進(jìn)程可以廣播不同的消息。27All-to-AllBroadcastandReductiononaRingSimplestapproach:performpone-to-allbroadcasts.Thisisnotthemostefficientway,though.Eachnodefirstsendstooneofitsneighborsthedataitneedstobroadcast.Insubsequentsteps,itforwardsthedatareceivedfromoneofitsneighborstoitsotherneighbor.Thealgorithmterminatesinp-1steps.28All-to-AllBroadcastandReductiononaRing最簡單的方法:執(zhí)行一對多廣播。這不是最有效的方式。每個節(jié)點首先向其鄰居中的一個發(fā)送需要廣播的數(shù)據(jù)。在隨后的步驟中,它將從其鄰居之一接收的數(shù)據(jù)轉(zhuǎn)發(fā)到其他鄰居。該算法以p-1步驟終止。2930tcomm=(ts+twm)(p-1)

31tcomm=2ts(√p–1)+twm(p-1)All-to-allbroadcastonaHypercubeThePrefix-SumOperationGivenpnumbersn0,n1,…,np-1(oneoneachnode),theproblemistocomputethesumssk=∑ik=0niforallkbetween0andp-1.Initially,nkresidesonthenodelabeledk,andattheendoftheprocedure,thesamenodeholdsSk.ThePrefix-SumOperation

ScatterandGatherInthescatteroperation,asinglenodesendsauniquemessageofsizemtoeveryothernode(alsocalledaone-to-allpersonalizedcommunication).Inthegatheroperation,asinglenodecollectsauniquemessagefromeachnode.Whilethescatteroperationisfundamentallydifferentfrombroadcast,thealgorithmicstructureissimilar,exceptfordifferencesinmessagesizes(messagesgetsmallerinscatterandstayconstantinbroadcast).Thegatheroperationisexactlytheinverseofthescatteroperationandcanbeexecutedassuch.分散和收集在分散操作中,單個節(jié)點向每個其他節(jié)點(也稱為一對一個性化通信)發(fā)送大小為m的唯一消息。在收集操作中,單個節(jié)點從每個節(jié)點收集唯一的消息。雖然散射操作與廣播基本不同,但除了消息大小的差異(消息在分散中變得更小并且在廣播中保持不變)之外,算法結(jié)構(gòu)是類似的。收集操作完全是散射操作的倒數(shù),可以這樣執(zhí)行。GatherandScatterOperationsScatterandgatheroperations.ExampleoftheScatterOperationThescatteroperationonaneight-nodehypercube.CostofScatterandGatherTherearelogpsteps,ineachstep,themachinesizehalvesandthedatasizehalves.有l(wèi)ogp步驟,在每一步中,機(jī)器大小一半和數(shù)據(jù)大小一半。Wehavethetimeforthisoperationtobe:Thistimeholdsforalineararrayaswellasa2-Dmesh.這個時間適用于線性陣列以及2-D網(wǎng)格。Thesetimesareasymptoticallyoptimalinmessagesize.這些時間在消息大小上是漸近最優(yōu)的。All-to-AllPersonalizedCommunicationEachnodehasadistinctmessageofsizemforeveryothernode.每個節(jié)點對于每個其他節(jié)點具有大小為m的不同消息。Thisisunlikeall-to-allbroadcast,inwhicheachnodesendsthesamemessagetoallothernodes.這與多對多廣播不同,其中每個節(jié)點向所有其他節(jié)點發(fā)送相同的消息。All-to-allpersonalizedcommunicationisalsoknownastotalexchange.多對多的個性化溝通也稱為總交換。All-to-AllPersonalizedCommunicationAll-to-allpersonalizedcommunication.All-to-AllPersonalizedCommunicationonaRingEachnodesendsallpiecesofdataasoneconsolidatedmessageofsizem(p–1)tooneofitsneighbors.每個節(jié)點將所有數(shù)據(jù)片段作為一個大小為m(p-1)的統(tǒng)一消息發(fā)送給其鄰居之一。Eachnodeextractstheinformationmeantforitfromthedatareceived,andforwardstheremaining(p–2)piecesofsizemeachtothenextnode.每個節(jié)點從接收到的數(shù)據(jù)中提取其意圖的信息,并將每個大小為m的剩余(p-2)個片段轉(zhuǎn)發(fā)到下一個節(jié)點。Thealgorithmterminatesinp–1steps.該算法以p-1步驟終止。Thesizeofthemessagereducesbymateachstep.每個步驟消息的大小減少m。

All-to-AllPersonalizedCommunicationonaRingAll-to-AllPersonalizedCommunicationonaRing:CostWehavep–1stepsinall.Instepi,themessagesizeism(p–i).Thetotaltimeisgivenby:All-to-AllPersonalizedCommunicationonaMeshEachnodefirstgroupsitspmessagesaccordingtothecolumnsoftheirdestinationnodes.All-to-allpersonalizedcommunicationisperformedindependentlyineachrowwithclusteredmessagesofsizem√p.Messagesineachnodearesortedagain,thistimeaccordingtotherowsoftheirdestinationnodes.All-to-allpersonalizedcommunicationisperformedindependentlyineachcolumnwithclusteredmessagesofsizem√p.All-to-AllPersonalizedCommunicationonaMesh每個節(jié)點首先根據(jù)目的節(jié)點的列對其p個消息進(jìn)行分組。在每一行中,使用大小為m√p的群集消息獨立執(zhí)行多對多通信。每個節(jié)點中的消息將重新排序,此時根據(jù)目標(biāo)節(jié)點的行。每個列中都會獨立執(zhí)行多對多的個性化通信,其中集群消息的大小為m√p。All-to-AllPersonalizedCommunicationonaMeshAll-to-AllPersonalizedCommunicationonaMesh:CostTimeforthefirstphaseisidenticaltothatinaringwith√pprocessors,第一階段的時間與具有√p處理器的環(huán)相同,i.e.,(ts+twmp/2)(√p–1).Timeinthesecondphaseisidenticaltothefirstphase.第二階段的時間與第一階段相同。Therefore,totaltimeistwiceofthistime,i.e.,All-to-AllPersonalizedCommunicationonaHypercubeGeneralizethemeshalgorithmtologpsteps.Atanystageinall-to-allpersonalizedcommunication,everynodeholdsppacketsofsizemeach.Whilecommunicatinginaparticulardimension,everynodesendsp/2ofthesepackets(consolidatedasonemessage).Anodemustrearrangeitsmessageslocallybeforeeachofthelogpcommunicationsteps.All-to-AllPersonalizedCommunicationonaHypercube廣義網(wǎng)格算法記錄p步。在所有個人通信的任何階段,每個節(jié)點都保存每個大小為m的p個數(shù)據(jù)包。在特定維度進(jìn)行通信時,每個節(jié)點都會發(fā)送這些數(shù)據(jù)包的p/2(統(tǒng)一為一個消息)。節(jié)點必須在每個日志p通信步驟之前在本地重新排列其消息。All-to-AllPersonalizedCommunicationonaHypercubeAll-to-AllPersonalizedCommunicationonaHypercube:CostWehavelogpiterationsandmp/2wordsarecommunicatedineachiteration.Therefore,thecostis:Thisisnotoptimal!All-to-AllPersonalizedCommunicationonaHypercube:OptimalAlgorithmEachnodesimplyperformsp–1communicationsteps,exchangingmwordsofdatawithadifferentnodeineverystep.Anodemustchooseitscommunicationpartnerineachstepsothatthehypercubelinksdonotsuffercongestion.Inthejthcommunicationstep,nodeiexchangesdatawithnode(iXORj).Inthisschedule,allpathsineverycommunicationsteparecongestion-free,andnoneofthebidirectionallinkscarrymorethanonemessageinthesamedirection.All-to-AllPersonalizedCommunicationonaHypercube:OptimalAlgorithm每個節(jié)點簡單地執(zhí)行p-1通信步驟,在每個步驟中與不同節(jié)點交換m個數(shù)據(jù)字。節(jié)點必須在每個步驟中選擇其通信伙伴,以便超立方體鏈路不會擁塞。在第j個通信步驟中,節(jié)點i與節(jié)點(iXORj)交換數(shù)據(jù)。在該時間表中,每個通信步驟中的所有路徑都是無擁塞的,并且沒有一個雙向鏈路在同一個方向上攜帶多個消息。All-to-AllPersonalizedCommunicationonaHypercube:OptimalAlgorithmAll-to-AllPersonalizedCommunicationonaHypercube:OptimalAlgorithmAproceduretoperformall-to-allpersonalizedcommunicationonad-dimensionalhypercube.ThemessageMi,jinitiallyresidesonnodeiandisdestinedfornodej.All-to-AllPersonalizedCommunicationonaHypercube:CostAnalysisofOptimalAlgorithmTherearep–1stepsandeachstepinvolvesnon-congestingmessagetransferofmwords.有p-1個步驟,每個步驟涉及m個詞的非擁塞消息傳遞。Wehave:CircularShiftAspecialpermutationinwhichnodeisendsadatapackettonode(i+q)modpinap-nodeensemble (0≤q≤p).節(jié)點i在p節(jié)點集合中向節(jié)點(i+q)modp發(fā)送數(shù)據(jù)分組的特殊排列(0≤q≤p)。CircularShiftonaMeshTheimplementationonaringisratherintuitive.Itcanbeperformedinmin{q,p–q}neighborcommunications

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論