![計(jì)算機(jī)系統(tǒng)基礎(chǔ):cache memory_第1頁](http://file4.renrendoc.com/view5/M00/09/17/wKhkGGZDfTiAVtI7AAB0HfxDqGY313.jpg)
![計(jì)算機(jī)系統(tǒng)基礎(chǔ):cache memory_第2頁](http://file4.renrendoc.com/view5/M00/09/17/wKhkGGZDfTiAVtI7AAB0HfxDqGY3132.jpg)
![計(jì)算機(jī)系統(tǒng)基礎(chǔ):cache memory_第3頁](http://file4.renrendoc.com/view5/M00/09/17/wKhkGGZDfTiAVtI7AAB0HfxDqGY3133.jpg)
![計(jì)算機(jī)系統(tǒng)基礎(chǔ):cache memory_第4頁](http://file4.renrendoc.com/view5/M00/09/17/wKhkGGZDfTiAVtI7AAB0HfxDqGY3134.jpg)
![計(jì)算機(jī)系統(tǒng)基礎(chǔ):cache memory_第5頁](http://file4.renrendoc.com/view5/M00/09/17/wKhkGGZDfTiAVtI7AAB0HfxDqGY3135.jpg)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1計(jì)算機(jī)體系結(jié)構(gòu)
cachememory
2次課2LocalityPrincipleofLocality:
ProgramstendtousedataandinstructionswithaddressesnearorequaltothosetheyhaveusedrecentlyTemporallocalityRecentlyreferenceditemsarelikely
tobereferencedagaininthenearfutureSpatiallocalityItemswithnearbyaddressestend
tobereferencedclosetogetherintime3LocalityAlllevelsofmoderncomputersystemsaredesignedtoexploitlocalityHardwareCachememory(tospeedupmainmemoryaccesses)OperatingsystemsUsemainmemorytospeedupvirtualaddressspaceaccessesUsemainmemorytospeedupdiskfileaccessesApplicationprogramsWebbrowsersexploittemporallocalitybycachingrecentlyreferenceddocumentsonalocaldisk4Localityintsumvec(intv[N]){ inti,sum=0;
for(i=0;i<N;i++) sum+=v[i]; returnsum;}Address0481216202428Contentsv0v1v2v3v4v5v6v7Accessorder123456785LocalityLocalityintheexamplesum:temporallocalityv:spatiallocalityStride-1referencepatternStride-kreferencepatternVisitingeveryk-thelementofacontiguousvectorAsthestrideincreases,thespatiallocalitydecreases6Stride-1referencepatternintsumarrayrows(inta[M][N]){ inti,j,sum=0;
for(i=0;i<M;i++) for(j=0;j<N;j++) sum+=a[i][j]; returnsum;}Address048121620Contentsa00a01a02a10a11a12Accessorder1234567Stride-Nreferencepatternintsumarraycols(inta[M][N])(M=2,N=3){ inti,j,sum=0;
for(j=0;j<N;j++) for(i=0;i<M;i++) sum+=a[i][j]; returnsum;}Address048121620Contentsa00a01a02a10a11a12Accessorder1352468LocalityLocalityoftheinstructionfetchSpatiallocalityInmostcases,programsareexecutedinsequentialorderTemporallocalityInstructionsinloopsmaybeexecutedmanytimes9LocalityDatareferencesReferencearrayelementsinsuccession SpatiallocalityReferencevariablesumeachiteration TemporallocalityInstructionreferencesReferenceinstructionsinsequence. SpatiallocalityCyclethroughlooprepeatedly Temporallocalitysum=0;for(i=0;i<n;i++) sum+=a[i];returnsum;10MemoryHierarchyFundamentalpropertiesofstoragetechnologyandcomputersoftwareDifferentstoragetechnologieshavewidelydifferentaccesstimesFastertechnologiescostmoreperbytethansloweronesandhavelesscapacityThegapbetweenCPUandmainmemoryspeediswideningWell-writtenprogramstendtoexhibitgoodlocality11Mainmemoryholdsdiskblocksretrievedfromlocaldisks.registerson-chipL1cache(SRAM)mainmemory(DRAM)localsecondarystorage(localdisks)Larger,slower,andcheaper(perbyte)storagedevicesremotesecondarystorage(distributedfilesystems,Webservers)Localdisksholdfilesretrievedfromdisksonremotenetworkservers.off-chipL2cache(SRAM)L1cacheholdscachelinesretrievedfromtheL2cache.CPUregistersholdwordsretrievedfromcachememory.L2cacheholdscachelinesretrievedfrommemory.L0:L1:L2:L3:L4:L5:Smaller,faster,andcostlier(perbyte)storagedevicesAnexamplememoryhierarchyCachesFundamentalideaofamemoryhierarchy:ForeachK,thefaster,smallerdeviceatlevelKservesasacacheforthelarger,slowerdeviceatlevelK+1.Whydomemoryhierarchieswork?Becauseoflocality,programstendtoaccessthedataat
levelkmoreoftenthantheyaccessthedataatlevelk+1.Thus,thestorageatlevelk+1canbeslower,andthuslargerandcheaperperbit.BigIdea:Thememoryhierarchycreatesalargepoolofstoragethatcostsasmuchasthecheapstoragenearthebottom,butthatservesdatatoprogramsattherateofthefaststoragenearthetop.GeneralCacheConcepts012345678910111213141589143CacheLarger,slower,cheapermemoryviewedaspartitionedinto“blocks”Dataiscopiedinblock-sizedtransferunitsSmaller,faster,moreexpensivememorycachesasubsetoftheblocks44410101014MemoryHierarchyBlocksAtlevelk+1ThestorageispartitionedintocontiguouschunksofdataobjectsEachblockhasauniqueaddressornameBlockscanbefixed-sizeorvariable-sizedAtlevelkThestorageispartitionedintoasmallersetofblocksTheblocksarethesamesizeastheblocksatlevelk+1Thestoragecontainscopiesofasubsetoftheblocksatlevelk+115MemoryHierarchyTransferunitsUsedtocopydatabackandforthbetweenlevelkandlevelk+1GeneralCacheConcepts:Hit012345678910111213141589143CacheDatainblockbisneededRequest:1414Blockbisincache:Hit!GeneralCacheConcepts:Miss012345678910111213141589143CacheDatainblockbisneededRequest:12Blockbisnotincache:Miss!BlockbisfetchedfrommemoryRequest:12121212BlockbisstoredincachePlacementpolicy:
determineswherebgoesReplacementpolicy:
determineswhichblock
getsevicted(victim)TypesofCacheMissesCold(compulsory)missColdmissesoccurbecausethecacheisempty.CapacitymissOccurswhenthesetofactivecacheblocks(workingset)islargerthanthecache.TypesofCacheMissesConflictmissMostcacheslimitblocksatlevelk+1toasmallsubset(sometimesasingleton)oftheblockpositionsatlevelk.e.g.Blockiatlevelk+1mustbeplacedinblock(imod4)atlevelk.Conflictmissesoccurwhenthelevelkcacheislargeenough,butmultipledataobjectsallmaptothesamelevelkblock.e.g.Referencingblocks0,8,0,8,0,8,...wouldmisseverytime.ExamplesofCachingintheHierarchyHardware0On-ChipTLBAddresstranslationsTLBWebbrowser10,000,000LocaldiskWebpagesBrowsercacheWebcacheNetworkbuffercacheBuffercacheVirtualMemoryL2cacheL1cacheRegistersCacheTypeWebpagesPartsoffilesPartsoffiles4-KBpage64-bytesblock64-bytesblock4-8byteswordsWhatisCached?Webproxyserver1,000,000,000RemoteserverdisksOS100MainmemoryHardware1On-ChipL1Hardware10On/Off-ChipL2AFS/NFSclient10,000,000LocaldiskHardware+OS100MainmemoryCompiler0CPUcoreManagedByLatency(cycles)WhereisitCached?Diskcache DisksectorsDiskcontroller100,000Diskfirmware21CacheMemorymainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememoryLiesbetweentheCPUandthemainmemoryCPUlooksfirstfordataincache,theninmainmemoryHoldfrequentlyaccessedblocksofmainmemoryincaches22CacheMemoryHistoryAtverybeginning,3levelsRegisters,mainmemory,storage10yearslater,4levelsRegister,cachememory,mainmemory,storageModernprocessor,5~6levelsRegisters,SRAML1,L2(,L3)cache,mainDRAMmemory,diskstorageCachememoriessmall,fastSRAM-basedmemoriesmanagedbyhardwareautomaticallycanbeon-chip,off-chip23InsertinganL1cachebetweentheCPUandmainmemoryabcdblock10pqrsblock21......wxyzblock30...Thebigslowmainmemoryhasroomformany8-wordblocks.ThesmallfastL1cachehasroomfortwo8-wordblocks.Thetiny,veryfastCPUregisterfilehasroomforfour4-bytewords.Thetransferunitbetweenthecacheandmainmemoryisa8-wordblock(32bytes).ThetransferunitbetweentheCPUregisterfileandthecacheisa4-byteblock.line0line124GenericCacheMemoryOrganization?
?
?B–110?
?
?B–110validvalidtagtagset0:B=2bbytespercacheblockElinespersetS=2ssetsttagbitsperline1validbitperline?
?
??
?
?B–110?
?
?B–110validvalidtagtagset1:?
?
??
?
?B–110?
?
?B–110validvalidtagtagsetS-1:?
?
??
?
?Cacheisanarrayofsets.Eachsetcontainsoneormorelines.Eachlineholdsablockofdata.25CacheMemoryFundamentalparametersParametersDescriptionsS=2sEB=2bm=log2(M)NumberofsetsNumberoflinespersetBlocksize(bytes)Numberofphysical(mainmemory)addressbits26CacheMemoryDerivedquantitiesParametersDescriptionsM=2ms=log2(S)b=log2(B)t=m-(s+b)C=BESMaximumnumberofuniquememoryaddressNumberofsetindexbitsNumberofblockoffsetbitsNumberoftagbitsCachesize(bytes)notincludingoverheadsuchasthevalidandtagbits27MemoryAccessingForamemoryaccessinginstructionmovlA%eaxAccesscachebyAdirectlyIfcachehitgetthevaluefromthecacheOtherwise,cachemisshandlinggetthevalue28Addressingcachestbitssbitsbbits0m-1<tag><setindex><blockoffset>PhysicalAddressA:0m-1Splitinto3parts:29Direct-mappedcacheSimplestkindofcacheCharacterizedbyexactlyonelineperset.validvalidvalidtagtagtag?
?
?set0:set1:setS-1:E=1linespersetcacheblockcacheblockcacheblockp63330AccessingDirect-MappedCachesThreestepsSetselectionLinematchingWordextraction31SetselectionUsethesetindexbitstodeterminethesetofinterestvalidvalidvalidtagtagtag?
?
?set0:set1:setS-1:tbitssbits000010m-1bbitstagsetindexblockoffsetselectedsetcacheblockcacheblockcacheblock32Linematching1tbitssbitsi01100m-1bbitstagsetindexblockoffsetselectedset(i):=1?=?(1)Thevalidbitmustbeset(2)Thetagbitsinthecachelinemustmatchthetagbitsintheaddress011030127456Findavalidlineintheselectedsetwithamatchingtag33WordExtraction1tbitssbits100i01100m-1bbitstagsetindexblockoffsetselectedset(i):blockoffsetselectsstartingbyte0110w3w0w1w23012745634SimpleMemorySystemCacheCache16lines4-bytelinesizeDirectmapped11109876543210OffsetIndexTag35SimpleMemorySystemCacheIdxTagValidB0B1B2B30191991123111150––––21B1000204083360––––4321436D8F0950D13672F01D6310––––716111C2DF0336SimpleMemorySystemCacheIdxTagValidB0B1B2B382413A00518992D0––––A2D19315DA3BB0B0––––C120––––D16104963415E13183771BD3F140––––37AddressTranslationExampleAddress:0x35411109876543210OffsetIndexTag001010101100Offset:0x0 Index:0x05Tag:0x0D
Hit?YesByte:0x3638LineReplacementonMissesCheckthecachelineofthesetindicatedbythesetindexbitsIfthecachelinevalid,itmustbeevictedRetrievetherequestedblockfromthenextlevelHowtogettheblock?Currentlineisreplacedbythenewlyfetchedline39Checkthecacheline1selectedset(i):=1?Ifvalidbitisset,evictthelineTag3012745640GettheAddressoftheStartingByteConsidermemoryaddresslookslikethefollowingClearthelastbitsandgettheaddressA0m-1<tag><setindex><blockoffset>0m-1<tag><setindex><blockoffset>xxx………xxx000………00041ReadtheBlockfromtheMemoryPutAontheBus(AisA’000)A0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory42ReadtheBlockfromtheMemoryMainmemoryreadsA’fromthememorybus,retrieves8bytesx,andplacesitonthebusX0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory43ReadtheBlockfromtheMemoryCPUreadsxfromthebusandcopiesitintothecacheline0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory44ReadtheBlockfromtheMemoryIncreaseA’by1,andcopyYinA’+1intothecacheline.0A’+1YmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory45ReadtheBlockfromtheMemoryRepeatseveraltimes(4or8)0A’+3WmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory46Cacheline,setandblockBlockAfixed-sizedpacketofinformationMovesbackandforthbetweenacacheandmainmemory(oralower-levelcache)LineAcontainerinacachethatstoresAblock,thevalidbit,thetagbitsOtherinformationSetAcollectionofoneormorelines47Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]xt=1s=2b=1xxx0vtagblock0000123set48Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]
missxt=1s=2b=1xxx10m[0]m[1]vtagblock000
0[0000](miss)(1)0123set49Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitxt=1s=2b=1xxx10m[0]m[1]vtagblock000
1[0001](hit)(2)0123set50Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissxt=1s=2b=1xxx10m[0]m[1]vtagblock011m[12]m[13]0
13[1101](miss)(3)0123set51Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmissxt=1s=2b=1xxx11m[8]m[9]vtagblock011m[12]m[13]0
8[1000](miss)(4)0123set52Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmiss
missxt=1s=2b=1xxx10m[0]m[1]vtagdata011m[12]m[13]0
0[0000](miss)(5)0123set53Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmiss
missxt=1s=2b=1xxx10m[0]m[1]vtagdata011m[12]m[13]0
0[0000](miss)(5)0123setThrashing!54ConflictMissesinDirect-MappedCaches1 floatdotprod(floatx[8],floaty[8])2 {3 floatsum=0.0;4 inti;56 for(i=0;i<8;i++)7 sum+=x[i]*y[i];8 returnsum;9 }55ConflictMissesinDirect-MappedCachesAssumptionforxandyxisloadedintothe32bytesofcontiguousmemorystartingataddress0ystartsimmediatelyafterxataddress32AssumptionforthecacheAblockis16bytesbigenoughtoholdfourfloatsThecacheconsistsoftwosetsAtotalcachesizeof32bytesThelast6bitsoftheaddressofx[0]is000000Thelast6bitsoftheaddressofx[4]is010000Thelast6bitsoftheaddressofy[0]is100000Thelast6bitsoftheaddressofy[4]is11000056ConflictMissesinDirect-MappedCachesTrashingReadx[0]willloadx[0]~x[3]intothecacheReady[0]willoverloadthecachelinebyy[0]~y[3]57ConflictMissesinDirect-MappedCachesPaddingcanavoidthrashingClaimx[12]insteadofx[8]Thelast6bitsofY[0]is11000058Whyusemiddlebitsasindex?High-OrderBitIndexingAdjacentmemorylineswouldmaptosamecachesetPooruseofspatiallocalityMiddle-OrderBitIndexingConsecutivememorylinesmaptodifferentcachelinesCanholdC-byteregionofaddressspaceincacheatonetime59Direct-mappedcachesimulationAddressbitsAddress(decimal)Tagbits(t=1)Indexbits(s=2)Offsetbits(b=1)Setnumber(decimal)01234567891011121314150000000011111111000001011010111100000101101011110101010101010101001122330011223360Direct-mappedcachesimulationAddressbitsAddress(decimal)Indexbits(s=2)Tagbits(t=1)Offsetbits(b=1)Setnumber(decimal)01234567891011121314150000000001010101101010101111111100110011001100110101010101010101000011112222333361Whyusemiddlebitsasindex?4-lineCacheHigh-OrderBitIndexingMiddle-OrderBitIndexing000110110000000100100011010001010110011110001001101010111100110111101111000000010010001101000101011001111000100110101011110011011110111162SetassociativecachesCharacterizedbymorethanonelinepersetvalidtagset0:E=2linespersetset1:setS-1:?
?
?cacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblock63AccessingsetassociativecachesSetselectionidenticaltodirect-mappedcachevalidvalidtagtagset0:validvalidtagtagset1:validvalidtagtagsetS-1:?
?
?tbitssbits000010m-1bbitstagsetindexblockoffsetSelectedsetcacheblockcacheblockcacheblockcacheblockcacheblockcacheblock64AccessingsetassociativecachesLinematchingandwordselectionmustcomparethetagineachvalidlineintheselectedset.(3)If(1)and(2),thencachehit,andblockoffsetselectsstartingbyte.10110w3w0w1w211001tbitssbits100i01100m-1bbitstagsetindexblockoffsetselectedset(i):=1?=?(2)Thetagbitsinoneofthecachelinesmustmatchthetagbitsintheaddress(1)Thevalidbitmustbeset.3012745665AssociativeCacheCache16lines4-bytelinesize2-waysetassociative11109876543210OffsetIndexTag66SimpleMemorySystemCacheIdxTagValidB0B1B2B30191991123110150––––11B1000204081360––––2321436D8F0920D13672F01D3310––––316111C2DF0367SimpleMemorySystemCacheIdxTagValidB0B1B2B342413A00518942D0––––52D19315DA3B50B0––––6120––––616104963415713183771BD37140––––68AddressTranslationExampleAddress:0x35411109876543210OffsetIndexTag001010101100Offset:0x0 Index:0x05Tag:0x1A
Hit?No
69LineReplacementonMissesIfallthecachelinesofthesetarevalidWhichlineisselectedtobeevicted?LFU(least-frequently-used)ReplacethelinethathasbeenreferencedthefewesttimesoversomepasttimewindowLRU(least-recently-used)ReplacethelinethatwaslastaccessedthefurthestinthepastAllofthesepoliciesrequireadditionaltimeandhardware70SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]xxt=2s=1b=1xx0vtagblock0000011set71SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]
miss100m[0]m[1]vtagblock000
0[0000](miss)(1)xxt=2s=1b=1xx72SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hit100m[0]m[1]vtagblock000
1[0001](hit)(2)xxt=2s=1b=1xx0011set73SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmiss100m[0]m[1]vtagblock1111m[12]m[13]0
13[1101](miss)(3)xxt=2s=1b=1xx0011set74SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmiss110m[8]m[9]vtagblock111m[12]m[13]10
8[1000](miss)LRU(4)xxt=2s=1b=1xx0011set75SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmiss
miss110m[8]m[9]vtagblock100m[0]m[1]10
0[0000](miss)LRU(5)xxt=2s=1b=1xx0011set76FullyassociativecachesCharacterizedbyallofthelinesintheonlyonesetNosetindexbitsintheaddressset0:validvalidtagtagcacheblockcacheblockvalidtagcacheblock…E=C/Blinesintheoneandonlysettbitsbbitstagblockoffset77AccessingfullyassociativecachesLinematchandwordselectionmustcomparethetagineachvalidline00110w3w0w1w211001tbits10001100m-1bbitstagblockoffset=1?=?(3)If(1)and(2),thencachehit,andblockoffsetselectsstartingbyte.(2)Thetagbitsinoneofthecachelinesmustmatchthetagbitsintheaddress(1)Thevalidbitmustbeset.30127456100110111078IssueswithWritesWritehitsWritethroughCacheupdatesitscopyImmediatelywritesthecorrespondingcacheblocktomemoryWritebackDefersthememoryupdateaslongaspossibleWritingtheupdatedblocktomemoryonlywhenitisevictedfromthecacheMaintainsadirtybitforeachcacheline79IssueswithWritesWritemissesWrite-allocateLoadsthecorrespondingmemoryblockintothecacheThenupdatesthecacheblockNo-write-allocateBypassesthecacheWritestheworddirectlytomemoryCombinationWritethrough,no-write-allocateWriteback,write-allocate(modernimplementation)80Multi-levelcachesL1d-cache,i-cache32k8-wayAccess:4cyclesL2unified-cache256k8-wayAccess:11cyclesL3unified-cache8M16-wayAccess:30~40cyclesBlocksize64bytesforallcache81CacheperformancemetricsMissRatefractionofmemoryreferencesnotfoundincache(misses/references)Typicalnumbers:3-10%forL1Canbequitesmall(<1%)forL2,dependingonsizeHitRatefractionofmemoryreferencesfoundincache(1-missrate)82CacheperformancemetricsHitTimetimetodeliveralineinthecachetotheprocessor(includestimetodeterminewhetherthelineisinthecache)Typicalnumbers:1-2clockcyclesforL1(4cyclesincorei7)5-10clockcyclesforL2(11cyclesincorei7)MissPenaltyadditionaltimerequiredbecauseofamissTypically50-200cyclesformainmemory(Trend:increasing!)83WhatdoesHitRateMean?ConsiderHitTime:2cyclesMissPenalty:200cyclesAverageaccesstime:Hitrate99%:2*0.99+200*0.01=4cyclesHitrate97%:2*0.97+200*0.03=8cyclesThisiswhy“missrate”isusedinsteadof“hitrate”84CacheperformancemetricsCachesizeHitratevs.hittimeBlocksizeSpatiallocalityvs.temporallocalityAssociativityThrashingCostSpeedMisspenaltyWritestrategySimple,readmisses,fewertransfer85WritingCache-FriendlyCodePrinciplesProgramswithbetterlocalitywilltendtohavelowermissratesProgramswithlowermissrateswilltendtorunfasterthanprogramswithhighermissrates86WritingCache-FriendlyCodeBasicapproachMakethecommoncasegofastProgramsoftenspendmostoftheirtimeinafewcorefunctions.ThesefunctionsoftenspendmostoftheirtimeinafewloopsMinimizethenumberofcachemissesineachinnerloop87WritingCache-FriendlyCode8[h]7[h]6[h]5[m]4[h]3[h]2[h]1[m]Accessorder,[h]itor[m]issi=7i=6i=5i=4i=3i=2i=1i=0v[i]Temporallocality,Thesevariablesareusuallyputinregisters(pp.650)intsumvec(intv[N]){ inti,sum=0;
for(i=0;i<N;i++) sum+=v[i]; returnsum;}Assumption:WordsarebytesCacheblocksare4wordsVisblockaligned88Writingcache-friendlycodeTemporallocalityRepeatedreferencestolocalvariablesaregoodbecausethecompilercancachethemintheregisterfile89Writingcache-friendlycodeSpatiallocalityStride-1referencespatternsaregoodbecausecachesatalllevelsofthememoryhierarchystoredataascontiguousblocksSpatiallocalityisespeciallyimportantinprogramsthatoperateonmultidimensionalarrays90Writingcache-friendlycodeExample(pp.651,M=4,N=8)intsumarrayrows(inta[M][N]){ inti,j,sum=0;
for(i=0;i<M;i++) for(j=0;j<N;j++) sum+=a[i][j]; returnsum;}Assumption:Sameasthepreviousone91Writingcache-friendlycodea[i][j]j=0j=1j=2j=3j=4j=5j=6j=7i=0i=1i=2i=31[m]9[m]17[m]25[m]2[h]10[h]18[h]26[h]3[h]11[h]19[h]27[h]4[h]12[h]20[h]28[h]5[m]13[m]21[m]29[m]6[h]14[h]22[h]30[h]7[h]15[h]23[h]31[h]8[h]16[h]24[h]32[h]92Writingcache-friendlycodeExample(pp.651,M=4,N=8)
intsumarraycols(inta[M][N]){ inti,j,sum=0;
for(j=0;j<N;j++) for(i=0;i<M;i++) sum+=a[i][j]; returnsum;}Assumption:SameasthepreviousoneArraysizeislargerthanthecachesize93Writingcache-friendlycodea[i][j]j=0j=1j=2j=3j=4j=5j=6j=7i=0i=1i=2i=31[m]2[m]3[m]4[m]5[m]6[m]7[m]8[m]9[m]10[m]11[m]12[m]13[m]14[m]15[m]16[m]17[m]18[m]19[m]20[m]21[m]22[m]23[m]24[m]25[m]26[m]27[m]28[m]29[m]30[m]31[m]32[m]94TheMemoryMountain95TheMemoryMountainReadthroughput(readbandwidth)TheratethataprogramreadsdatafromthememorysystemMemorymountainReaddatafromanarrayAtwo-dimensionalfunctionofreadbandwidthismeasuredCharacterizesthecapabilitiesofthememorysystemforeachcomputer96WorkingSetDimension97AccessStrideDimension98TheMemoryMountainDataSizeMAXBYTES(64M)bytesorMAXELEMS(8M)doublesPartiallyaccessedWorkingset:from64MBto2KBStride:from1to6499Memorymountainmainroutine/*mountain.c-Generatethememorymountain.*/#defineMINBYTES(1<<11)/*Workingsetsizerangesfrom2KB*/#defineMAXBYTES(1<<26)/*...upto64MB*/#defineMAXSTRIDE64/*Stridesrangefrom1to64*/#defineMAXELEMSMAXBYTES/sizeof(data)doubledata[MAXELEMS];/*Thearraywe'llbetraversing*/100Memorymountainmainroutineintmain(){intsize; /*Workingsetsize(inbytes)*/intstride; /*Stride(inarrayelements)*/doubleMhz; /*Clockfrequency*/init_data(data,MAXELEMS);/*Initializeeachelementindatato1*/Mhz=mhz(0); /*Estimatetheclockfrequency*/101Memorymountainmainroutinefor(size=MAXBYTES;size>=MINBYTES;size>>=1){ for(stride=1;stride<=MAXSTRIDE;stride++) printf("%.1f
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 晉中山西晉中市太谷區(qū)面向2025屆公費(fèi)師范生招聘教師18人筆試歷年參考題庫附帶答案詳解
- 2025年中國(guó)太子佛工藝品市場(chǎng)調(diào)查研究報(bào)告
- 2025至2031年中國(guó)高壓透鏡行業(yè)投資前景及策略咨詢研究報(bào)告
- 2025年藝術(shù)道閘項(xiàng)目可行性研究報(bào)告
- 2025年紅外線按摩棒項(xiàng)目可行性研究報(bào)告
- 2025年電加熱針織物呢毯預(yù)縮機(jī)項(xiàng)目可行性研究報(bào)告
- 成都四川成都天府國(guó)際競(jìng)技訓(xùn)練中心招聘運(yùn)動(dòng)員4人筆試歷年參考題庫附帶答案詳解
- 2025年曲印項(xiàng)目可行性研究報(bào)告
- 2025年揉切粉碎機(jī)項(xiàng)目可行性研究報(bào)告
- 2025年安康魚野菜串項(xiàng)目可行性研究報(bào)告
- 綜合實(shí)踐項(xiàng)目 制作水族箱飼養(yǎng)淡水魚 教學(xué)設(shè)計(jì)-2024-2025學(xué)年魯科版生物六年級(jí)上冊(cè)
- 公轉(zhuǎn)私付款合同模板
- 安徽省2024年高考語文模擬試卷及答案5
- 江西省“振興杯”信息通信網(wǎng)絡(luò)運(yùn)行管理員競(jìng)賽考試題庫-上(單選題)
- DLT 5756-2017 額定電壓35kV(Um=40.5kV)及以下冷縮式電纜附件安裝規(guī)程
- 關(guān)于餐飲合同范本
- 2023高考數(shù)學(xué)藝考生一輪復(fù)習(xí)講義(學(xué)生版)
- CHT 4019-2016 城市政務(wù)電子地圖技術(shù)規(guī)范(正式版)
- 廣西壯族自治區(qū)南寧市2024年七年級(jí)下學(xué)期語文期末試卷附答案
- 冀教版五年級(jí)數(shù)學(xué)下冊(cè)全冊(cè)課件【完整版】
- 微量注射泵安全使用和維護(hù)保養(yǎng)
評(píng)論
0/150
提交評(píng)論