版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
程序設(shè)計(jì)與算法基礎(chǔ)(6)潘愛民2006/10/30程序設(shè)計(jì)與算法基礎(chǔ)(6)潘愛民1OutlineHashtablesBloomfilterInvertedindexOutlineHashtables2SearchingproblemagainForalinkedlist,->O(n)Forasortedarray,->O(logn)CanweexpectO(1)?whichmeansRegardlessofthenumberofelementsbeingsearched,theruntimeisalwaysthesameGivenakey,thepositioninthetablecanbeaccesseddirectlySearchingproblemagainForal3OperationsonhashtablesCollectionofpairs(key,element),herekeymaybeastring,number,record,etc.PairshavedifferentkeysOperationsGet(theKey)Delete(theKey)Insert(theKey,theElement)OperationsonhashtablesColle4IdealHashingUsesa1Darray(ortable)table[0…m-1].EachpositionofthisarrayisabucketAbucketcannormallyholdonlyonepairUsesahashfunctionhthatconvertseachkeykintoanindexintherange[0,m-1]h(k)isthehomebucketforkeykEverypair(key,element)isstoredinitshomebuckettable[h[key]]IdealHashingUsesa1Darray(5[0][1][2][3][4][5][6][7]IdealHashingExamplePairsare:(22,a),
(33,c),
(3,d),
(73,e),
(85,f)Hashtableis
table[0…7],m=8Hashfunctionis
h(key)=key/11
Pairsarestoredintableasbelow(85,f)(22,a)(33,c)(3,d)(73,e)
Get,
Insert,and
Deletetake
O(1)
time[0][1][2][3][4][5][6][7]Ideal6WhatCanGoWrong?Wheredoes(26,g)go?Keysthathavethesamehomebucketaresynonyms22and26aresynonymswithrespecttothehashfunctionthatisinuse.Thehomebucketfor(26,g)isalreadyoccupied.[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)WhatCanGoWrong?Wheredoes(7WhatCanGoWrong?AcollisionoccurswhenthehomebucketforanewpairisoccupiedbyapairwithadifferentkeyAnoverflowoccurswhenthereisnospaceinthehomebucketforthenewpairWhenabucketcanholdonlyonepair,collisionsandoverflowsoccurtogetherNeedamethodtohandleoverflows(85,f)(22,a)(33,c)(3,d)(73,e)WhatCanGoWrong?Acollision8HashTableIssuesChoiceofhashfunctionsCollisionresolutionSizeofhashtableThesizeofbucket&numberofbucketsHashTableIssuesChoiceofhas9HashfunctionsDefinition:AhashfunctiontransformsakeyKintoanindexinthetableusedforstoringitemsofthesametypeasKIfhashfunctionhtransformsdifferentkeysintodifferentnumbers,itiscalledaperfecthashfunction
Therefore,ahashfunctionisamappingfromnitemstompositionsTotalnumberofpossiblemappingsismnIfn
m,thenumberofperfecthashfunctionsism!/(m-n)!HashfunctionsDefinition:10HashFunctionsTwopartsConvertakeyintoanintegerincasethekeyisnotanintegerh(k)Mapanintegerintoahomebucketf(k)isanintegerintherange[0,m-1],wheremisthenumberofbucketsinthetableHashFunctionsTwoparts11StringToNon-negativeIntegerEachcharacteris1bytelongAnintis4bytesAtwo-characterstringsmaybeconvertedintoaunique4bytenon-negativeintusingthecode:intanswer=s.at(0);answer=(answer<<8)+s.at(1);Stringsthatarelongerthan3charactersdonothaveauniquenon-negativeintrepresentationStringToNon-negativeInteger12StringToNonnegativeIntegerunsignedlongoperator()(conststringtheKey){ //ConverttheKeytoanonnegativeinteger.unsignedlonghashValue=0;intlength=(int)theKey.length();for(inti=0;i<length;i++)hashValue=5*hashValue+theKey.at(i);returnhashValue;}StringToNonnegativeIntegeru13MapintoahomebucketMostcommonmethodisbydivision
homeBucket=h(theKey)%divisor;divisor
equalsthenumberofbuckets
m0
homeBucket<divisor=m[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)MapintoahomebucketMostcom14UniformHashFunctionLetkeySpacebethesetofallpossiblekeysAuniformhashfunction
mapsthekeysinkeySpaceintobucketssuchthatapproximatelythesamenumberofkeysgetmappedintoeachbucket[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)UniformHashFunctionLetkeySp15UniformHashFunctionEquivalently,theprobabilitythatarandomlyselectedkeyhasbucketiasitshomebucketis1/m,0
i<mAuniformhashfunctionminimizesthelikelihoodofanoverflowwhenkeysareselectedatrandom[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)UniformHashFunctionEquivale16HashingByDivisionkeySpace=allintsForeverym,thenumberofintsthatgetmapped(hashed)intobucketiisapproximately232/mTherefore,thedivisionmethodresultsinauniformhashfunctionwhenkeySpace=allintsInpractice,keystendtobecorrelatedSo,thechoiceofthedivisormaffectsthedistributionofhomebucketsHashingByDivisionkeySpace=17SelectingTheDivisorBecauseofthiscorrelation,applicationstendtohaveabiastowardskeysthatmapintooddintegers(orintoevenones)Whenthedivisorisanevennumber,oddintegershashintooddhomebucketsandevenintegersintoevenhomebuckets20%14=6,30%14=2,8%14=815%14=1,3%14=3,23%14=9ThebiasinthekeysresultsinabiastowardeithertheoddorevenhomebucketsSelectingTheDivisorBecauseo18SelectingTheDivisorWhenthedivisorisanoddnumber,odd(even)integersmayhashintoanyhome20%15=5,30%15=0,8%15=815%15=0,3%15=3,23%15=8ThebiasinthekeysdoesnotresultinabiastowardeithertheoddorevenhomebucketsBetterchanceofuniformlydistributedhomebucketsSodonotuseanevendivisorSelectingTheDivisorWhenthe19SelectingTheDivisorSimilarbiaseddistributionofhomebucketsisseen,inpractice,whenthedivisorisamultipleofprimenumberssuchas3,5,7,…TheeffectofeachprimedivisorpofmdecreasesaspgetslargerIdeally,choosemsothatitisaprimenumberAlternatively,choosemsothatithasnoprimefactorsmallerthan
20SelectingTheDivisorSimilarb20HashingbyfoldingThekeyisdividedintoseveralparts,andthesepartsarecombinedorfoldedtogetherShiftfoldingUsingasimpleoperationsuchasadditiontocombinetheminacertainwayForexample,thesocialsecuritynumber(SSN),123-45-6789(123+456+789)%m (m=1000)BoundaryfoldingThepiecesofthekeysarefoldedonthebordersbetweendifferentparts,forexample(123+654+789)%m (m=1000)HashingbyfoldingThekeyisd21AboutfoldingSimpleandfast,bitpatterncanbeusedinsteadofnumericalvaluesInthecaseofstringsXor’ingthecharacterstogether,andusingtheresultfortheaddressXor’ingthechunksofstringsratherthansinglecharacters.ThelengthofachunkisequaltothenumberofbytesinanintegerTypically,theresultoffoldingprocessingaredividedmodulomAboutfoldingSimpleandfast,22HashingbyaMid-squarefunctionThekeyissquaredandthemiddleormidpartoftheresultisusedasthehashvalueEntirekeyparticipatesingeneratingthehashvalue,soabetterchancethatthedifferentvaluesaregeneratedfordifferentkeysInpractice,itismoreefficienttochooseapowerof2forthesizeofthetable,m,andextractthemidpartofthebitrepresentationofthesquareofakey,justusingamaskandashiftoperationForexample,31212=100101001010000101100001,
themidpart101000010=322HashingbyaMid-squarefuncti23HashingbyextractionOnlyapartofthekeyisusedtocomputetheaddressIfthepartisdistributeduniformly,itcanbesufficientforhashing(providedtheomittedportiondistinguishesthekeysonlyinaninsignificantway)Forexample,inthestudentID,somedigitsaresafelyomittedinahashfunctionISBNcodeissimilarHashingbyextractionOnlyapa24HashingbyradixtransformationThekeyistransformedintoanothernumberbaseForexample,Kisthedecimalnumber345,thenitsvalueinbase9is423(9)Thenitisdividedmodulominoriginalbase,theresultingnumberisusedasthehashvalueIfm=100,345(10)and245(10)arenothashedtothesamevalue,but345(10)and264(10)willbehashedtoasamevalue,23Hashingbyradixtransformatio25HashingbycryptographichashalgorithmsStronghashalgorithmsAchangeofanybitintheinputwillcausethechangeofanybitintheoutputwith0.5probabilityapproximatelyTheencryptionalgorithmhassimilarfeatureUsingcryptographicalgorithmstohashakeyForexample,computetheMD5(key)orSHA1(key)
orencryptthekeywithafixedencryptionkey,DESk(key)ThendividedmodulomHashingbycryptographichash26CollisionresolutionAvoidcollisionIncreasingthetablesizemayleadtobetterhashing,butsometimesnotThetwofactors–hashfunctionandtablesize–mayminimizethenumberofcollisions,buttheycannotcompletelyeliminatethem
(ifthesizeofkeyspaceislargerthanthetablesize)SomestrategiesOpenaddressingChainingBucketaddressingCollisionresolutionAvoidcoll27CollisionresolutionbyopeningaddressingAcollisionoccurswhenthehomebucketforanewpair(key,element)isoccupiedWemayhandlecollisionsby:Searchthehashtableinsomesystematicfashionforabucketthatisnotoccupied.LinearprobingQuadraticprobingRandomprobingEliminateoverflowsbypermittingeachbuckettokeepalistofallpairsforwhichitisthehomebucketDynamicarraylinkedlistCollisionresolutionbyopenin28OpeningaddressingIfthepositionh(K)isoccupied,thenthepositionsintheprobingsequence
norm(h(K)+p(1)),norm(h(K)+p(2)),…,norm(h(K)+p(m))
aretrieduntileitheranavailablecellisfoundorthesamepositionsaretriedrepeatedlyorthetableisfullIntheprobingsequence,Functionpisprobingfunctioniisaprobenormisanormalizationfunction,(divisionmodulotothesizeofthetable)OpeningaddressingIftheposit29LinearprobingIntheopenaddressingp(i)=c*i,
soifc=1,thenthepositiontobetriedis(h(K)+i)Inlinearprobing,Insertion:thepositiontostoreakeyisfoundbysequentiallysearchingallpositionsstartingfromthepositioncalculatedbythehashfunctionuntilanemptycellisfoundTendencytocreateclustersinthetable,theemptycellsfollowingclustershaveamuchgreaterchancetobefilledthanotherpositionsLinearprobingIntheopenaddr30LinearProbing–GetAndInsertdivisor=m
(numberofbuckets)=17Homebucket=key%170481216Insertpairswhosekeysare6,12,34,29,28,11,23,7,0,33,30,45612293428112370333045LinearProbing–GetAndInser31QuadraticprobingIntheopenaddressingIngeneral,p(i)=c2*i2+c1*i,forexample,
p(i)=(-1)i-1((i+1)/2)2,sothepositiontobetriedis
h(K),h(K)+1,h(K)-1,h(K)+4,h(K)-4,…,
h(K)+(m-1)2/4,h(K)–(m-1)2/4Theexperiencevalueofm(tablesize)isaprimeintheformof4j+1Question:Whynotusep(i)=i2,0i<m
(i2-j2)=(i+j)(i-j)TheeffectofclusteringisbetterthanlinearprobingSecondaryclusterforthekeyshashedtothesamelocationQuadraticprobingIntheopena32RandomprobingpfunctionisdefinedasapseudorandomnumbergeneratorArequirementInordertofindtheprobingsequence,therandomnumbergeneratorshouldbeinitializedtothesameseedforthesamekeyAvoidsecondaryclustersThesameprobingsequenceforakeyDifferentprobingsequencesfordifferentkeyswiththesamehashvalueRandomprobingpfunctionisde33Doublehashingpfunctionisdefinedwithanotherhashfunction,hp(K)
p(i)=i*hp(K)Theprobingsequenceish(K),h(K)+hp(K),h(K)+2hp(K),h(K)+3hp(K),…,
h(K)+(m-1)hp(K)Theprobingsequencewilldependonthechoiceofhashfunctionhp(K)Thehashfunctionhp(K)
canbedefinedwiththeoriginalhashfunctionh(K),forexample,
hp(K)=i*h(K)+1,thentheprobingsequenceis
h(K)+i*(i*h(K)+1)=(i2+1)h(K)+iSecondaryclustersDoublehashingpfunctionisde34PerformanceofopeningaddressingSuccessfulsearchesvs.unsuccessfulsearches
[TheArtofComputerProgramming,Volume3]LinearprobingQuadraticsearchDoublehashingSuccessfulsearches[1+1/(1-LF)]/21-ln(1-LF)-LF/2-ln(1-LF)/LFUnsuccessfulsearches[1+1/(1-LF)2]/21/(1-LF)-LF-ln(1-LF)1/(1-LF)LF=n/m,loadfactor=(numberofelementsinthetable)/tablesizePerformanceofopeningaddress35PerformanceOfLinearProbingWorst-caseGet/InserttimeisQ(n),wherenisthenumberofkeysinthetableThishappenswhenallpairsareinthesameclusterExpectedPerformanceSn=expectednumberofbucketsexaminedinasuccessfulsearchUn=expectednumberofbucketsexaminedinaunsuccessfulsearch0481216612293428112370333045PerformanceOfLinearProbingW36ExpectedPerformanceSn~[1+1/(1-LF)]/2Un~[1+1/(1-LF)2]/2Notethat0LF1LF
0.65isrecommended.LFSnUn0.51.52.50.651.94.60.752.58.50.905.550.5ExpectedPerformanceSn~[1+137LinearProbing–DeleteDelete(0)0481216612293428112370333045048121661229342811237453330Searchclusterforpair(ifany)tofillvacatedbucket048121661229342811237453330LinearProbing–DeleteDelete(38LinearProbing–Delete(34)Searchclusterforpair(ifany)tofillvacatedbucket0481216612293428112370333045048121661229028112373330450481216612290281123733304504812166122928112370333045LinearProbing–Delete(34)Sea39LinearProbing–Delete(29)Searchclusterforpair(ifany)tofillvacatedbucket048121661229342811237033304504812166123428112370333045048121661211342823703330450481216612113428237033304504812166121134282370333045LinearProbing–Delete(29)Sea40HashTableDesignPerformancerequirementsaregiven,determinemaximumpermissibleloadfactorWewantasuccessfulsearchtomakenomorethan10compares(expected)Sn~?(1+1/(1–LF))LF18/19Wewantanunsuccessfulsearchtomakenomorethan13compares(expected)Un~?(1+1/(1–LF)2)LF4/5SoLFmin{18/19,4/5}=4/5HashTableDesignPerformancer41HashTableDesignDynamicresizingoftableWheneverloadfactorexceedsthreshold(4/5inourexample),rehashintoatableofapproximatelytwicethecurrentsizeFixedtablesizeKnowmaximumnumberofpairsNomorethan1000pairsLF
4/5=>m
5/4*1000=1250.Pick
m(equaltodivisor)tobeaprimenumberoranoddnumberwithnoprimedivisorssmallerthan20HashTableDesignDynamicresiz42Collisionresolution:chainingEachbucketkeepsalinearlistofallpairsforwhichitisthehomebucketNeveroverflowifthecapacityofthelinearlistisunlimitedThelinearlistmayormaynotbesortedbykey.ThelinearlistmaybeanarraylinearlistorachainPerformanceIncreasingthelengthofthelistscandegraderetrievalperformanceRequireadditionalspaceforpointersCollisionresolution:chaining43Putinpairswhosekeysare6,12,34,29,28,11,23,7,0,33,30,45Homebucket=key%17.SortedChains[0][4][8][12][16]126342928112370333045Putinpairswhosekeysare6,44ExpectedPerformanceNotethatLF0.ExpectedchainlengthisLF.Sn~1+LF/2(inthecaseofunsortedlist)Un~LFExpectedPerformanceNotethat45Coalescedhashing(coalescedchaining)Combinelinearprobingandchainingh(K)=K%17
Insertpairswhosekeysare6,12,34,29,28,11,23,7,0Alternatively,thecollidingkeycanbeputinanoverflowarea(calledacellar)0481216612293428112370Coalescedhashing(coalescedc46BucketaddressingAssociateabucketwitheachaddress,andabucketisablockofspaceenoughtostoremultipleitemsCollisionisnottotallyavoided,ifabucketisfull,thenAnewitemhashedtothebucketcanbestoredinthenextbucket,justasdoesintheopenaddressingapproachThenewitemcanalsobestoredinanoverflowarea.ThebucketwillbemarkedwithaflagindicatingithasadditionalitemstobesearchedBucketaddressingAssociateab47PerfecthashfunctionsPerfecthashfunctionsoridealhashfunctionsIthashesakeytoitsproperposition,andnocollisionsoccurAnassumptionisthatthekeysetisknownIfthenumberofcellsinthetableisequaltothenumberofdataitems,itiscalledaminimalperfecthashfunction sonospaceiswastedExamplesReservedwordsusedbyassemblers,compilers,filesonunerasableopticaldisks,dictionaries,…ItisnoteasytoobtainaperfecthashfunctionPerfecthashfunctionsPerfect48Cichelli’smethodtoconstructaminimalperfecthashfunctionDevelopedbyRichardJ.CichelliUsedtohasharelativelysmallnumberofreservedwordsThefunctionisoftheform
h(word)=(length(word)+g(firstletter(word))
+g(lastletter(word)))modm
wheregisthefunctiontobeconstructedCichelli’smethodtoconstruct49Cichelli’salgorithmChooseavalueforMaxComputethenumberofoccurrencesofeachfirstandlastletterinthesetofallwordsOrderallwordsinaccordancetothefrequencyofoccurrenceofthefirstandthelastlettersExamples:Calliope,Clio,Erato,Euterpe,Melpomene,Polyhymnia,Terpsichore,Thalia,Urania=>E(6),A(3),C(2),O(2),T(2),M(1),P(1),U(1)Euterpe,Calliope,Erato,Terpsichore,Melpomene,Thalia,Clio,Polyhymnia,UraniaCichelli’salgorithmChooseav50Cichelli’salgorithm(cont.)Search(wordList) ifwordListisempty halt;
word=firstwordfromwordList;
wordList=wordListwiththefirstworddetached; ifthefirstandthelastlettersofwordareassignedg-values try(word,-1,-1)//-1signifies‘valuealreadyassigned’ ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue; elseifneigherthefirstnorthelastlettershasag-value foreachn,min{0,…,Max) try(word,n,m); ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue; elseifeitherthefirstorthelastletterhasag-value foreachnin{0,…,Max) try(word,-1,n); ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue;Cichelli’salgorithm(cont.)Se51Cichelli’salgorithm(cont.)try(word,firstLetterValue,lastLetterValue) ifh(word)hasnotbeenclaimed reserveh(word); ifnot-1(i.e.,notreserved) assignfirstLetterValueand/orlastLetterValue
asg-valuesof firstletter(word)and/orlastletter(word) returnsuccess returnfailureCichelli’salgorithm(cont.)tr52Ainvocationsofthesearchingprocedure reservedhashvaluesEuterpeE=0h=7 {7}CalliopeC=0h=8 {78} EratoO=0h=5 {578} TerpsichoreT=0 h=2 {2578} Melpomene M=0 h=0 {02578} Thalia A=0 h=6 {025678} Clio h=4 {0245678} Polyhymnia P=0 h=1 {01245678} Urania U=0 h=6* {01245678} Urania U=1 h=7* {01245678} Urania U=2 h=8* {01245678} Urania U=3 h=0* {01245678} Urania U=4 h=1* {01245678} Polyhymnia P=1 h=2* {0245678} Polyhymnia P=2 h=3 {02345678} Urania U=0 h=6* {02345678} Urania U=1 h=7* {02345678} Urania U=2 h=8* {02345678} Urania U=3 h=0* {02345678} Urania U=4 h=1 {012345678}Ainvocationsofthesearching53Aboutcichelli’salgorithmAbrute-forcesearchforg-function,sothesearchprocessisexponentialItisnotapplicabletoalargenumberofwordsItdoesnotguaranteethataperfecthashfunctioncanbefoundExtensions(1)Thesecondtolastlettersinthewordareinvolvedinthehashfunctions(2)h(word)=length(word)+g1(firstletter(word))+…+glength(word)(lastletter(word))(3)h(word)=bucketgr(word)+hgr(word)(word)Aboutcichelli’salgorithmAbr54FHCDalgorithmSearchforaminimalperfecthashfunctionoftheform
h(word)=h0(word)+g(h1(word))+g(h2(word))
h0:keyspace->[0,m)
h1:keyspace->[0,r)
h2:keyspace->[r,2r)Steps:Mapping(dependencygraph):betweenh1toh2foreachwordOrdering:firstdeterminethenodewithmorelinksSearching:assignhashvaluestokeysbychoosingappropriateg-valueforeachvertex
Foreachvertex,eitherg(h1(word))org(h2(word))isknownFHCDalgorithmSearchforamin55Extensiblehashing0001101100011
001100010110011
110111110001100
010112221b00b01b1h(k)=11001h(k)=000010001101100011
00110001011001101100
010112222b00b01b101101111100110012b1100000101001100011
000011001101100
010113322b000b01b101101111100110012b1110010111011100110
001013b001Extensiblehashing00011011000156AnexampleofhashfunctionIndistributedenvironments,howtopartitionalargenumberofjobs(orrecords)intoaclusterofmachineskeyv=h(key)AnexampleofhashfunctionIn57LinearhashingNoindexisneededThebucketissplitifnecessaryAteachlevelofsplitting,linearhashingmaintaintwohashfunctions,hlevelandhlevel+1,suchthat
hlevel=Kmode(TSize*2level)0001
001000110101
01111000
100110111100
1101LinearhashingNoindexisneed58STLhash_mapSimplyusesadivisorthatisanoddnumberThissimplifiesimplementationbecausewemustbeabletoresizethehashtableasmorepairsareputintothedictionaryArraydoubling,forexample,requiresyoutogofroma1Darraytablewhoselengthism(whichisodd)toanarraywhoselengthis2m+1(whichisalsoodd)STLhash_mapSimplyusesadivi59AbouthashtablesPerfecthashfunctions(n
m)Uniformhashfunctions(nm)PerformancerequirementsHowtodealwithcollisionsHowtodealwithoverflowsifthesizeofbucketsisfixedAbouthashtablesPerfecthash60BloomFilters–anintroductionDifferentialFilesSimplelargedatabaseFileofrecordsresidingondiskSinglekeyIndextorecordsOperationsRetrieveUpdateInsertanewrecordMakechangestoanexistingrecordDeletearecordBloomFilters–anintroductio61Na?veModeOfOperationProblemsIndexandFilechangewithtimeRecovery=>CopyMasterFile(MF)frombackupCopyMasterIndex(MI)frombackupProcessalltransactionssincelastbackupRecoverytimedependsonMF&MIsize+#transactionssincelastbackupKeyIndexFileNa?veModeOfOperationProblem62DifferentialFileMakenochangestomasterfileAlterindexandwriteupdatedrecordtoanewfilecalleddifferentialfileKeyIndexFileDFAdvantageDFissmallerthanFileandsomaybebackedupmorefrequentlyIndexneedstobebackedupwheneverDFis.So,indexshouldbenolargerthanDFRecoverytimeisreducedDifferentialFileMakenochang63DifferentialFileOperationKeyIndexFileDFDisadvantageEventuallyDFbecomeslargeandcannolongerbebackedupwithdesiredfrequencyMustintegrateFileandDFnowFollowingintegration,DFisemptyDifferentialFileOperationKey64DifferentialFileOperationKeyIndexFileDFLargeIndexIndexcannotbebackedupasfrequentlyasdesiredTimetorecovercurrentstateofindex&DFisexcessiveUseadifferentialindexMakenochangestoIndexDIisanindextoalldeletedrecordsandupdatedrecordsinDFDifferentialFileOperationKey65DifferentialFile&IndexOperationPerformancehitMostqueriessearchbothDIandIndexIncreasein#ofdiskaccesses/queryUseafiltertodecidewhetherornotDIshouldbesearchedKeyIndexFileDFDIYNYDifferentialFile&IndexOper66IdealFilterKeyIndexFileDFFilterYNYDIYY
=>thiskeyisintheDIN
=>thiskeyisnotintheDIFunctionalityofidealfilterissameasthatofDISo,afilterthateliminatesperformancehitofDIdoesn’texistIdealFilterKeyIndexFileDFFilt67BloomFilter(BF)N
=>thiskeyisnotintheDIM(maybe)
=>thiskeymaybeintheDIFiltererrorBFsaysMaybeDIsaysNoKeyIndexFileDFBFYNYDIMNBloomFilter(BF)N=>thiskey68BloomFilter(BF)FiltererrorBFsaysMaybeDIsaysNoKeyIndexFileDFBFYNYDIMNBFresidesinmemoryPerformancehitpaidonlywhenthereisafiltererrorBloomFilter(BF)FiltererrorK69BloomFilterByBurtonH.Bloomin1970Bloomfilterisaspace-efficientprobabilisticdatastructurethatisusedtotestwhetheranelementisamemberofasetFalsepositivesarepossibleFalsenegativearenotItissuitableforthesetthatsupportaddition/queryoperationsRemovinganelementisnoteasy(consideringacountingfilter)See/wiki/Bloom_filterBloomFilterByBurtonH.Bloom70BloomFilterDesignUsembitsofmemoryfortheBFLargerm=>fewerfiltererrorsWhenBFempty,allm
bits
=0Usek>0hashfunctions:f1(),f2(),…,fk()WhenkeyKeyinsertedintoBF,setbitsf1(Key),f2(Key),…,andfk(Key)to1f1(Key),f2(key),…,fk(Key)isthesignatureofkeyKeyBloomFilterDesignUsembits71Examplem=11(normally,mwouldbemuchmuchlarger)k=2(2hashfunctions)f1(Key)
=Keymodmf2(Key)
=(2Key)modmKey=15Key=17000000000001234567890111110Examplem=11(normally,mwou72ExampleBFhasKey=15andKey=17Searchfor
Keyf1(Key)
=0or
f2(Key)
=0=>
Key
notinBFf1(Key)
=1and
f2(Key)
=1=>
Key
maybeinBFKey=6
=>filtererror000000000001234567890111110ExampleBFhasKey=15andKey73BloomFilterDesignChoosem(thefiltersizeinbits)UseasmuchmemoryasisavailablePickk(thenumberofhashfunctions)ktoosmall=>probabilityofdifferentkeyshavingsamesignatureishigh.ktoolarge=>filterbecomesfilledwith1stoosoonSelectthekhashfunctionsHashfunctionsshouldberelativelyindependentBloomFilterDesignChoosem(t74TheparametersofourbloomfilterProbabilityofafiltererrordependson:Filtersize…m#ofhashfunctions…k#ofelementsinsertedbeforefilterisresetto0…nAssumethatmandnare
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 9我的戰(zhàn)友邱少云課件(共21張)
- 2025年度屋頂綠化植物種植與養(yǎng)護(hù)合同3篇
- 2025年度出租車司機(jī)職業(yè)健康保險(xiǎn)及補(bǔ)充醫(yī)療保險(xiǎn)合同3篇
- 2025年度企業(yè)市場營銷策劃合同范本2篇
- 2024露天宴會廳租賃及餐飲服務(wù)合同3篇
- 2024綠植租擺合同-企業(yè)員工福利項(xiàng)目協(xié)議3篇
- 2024跨境電商平臺運(yùn)營代理協(xié)議
- 【單元AB卷 能力提升卷】人教新起點(diǎn)英語二年級上冊單元能力提升卷-Unit 2 Boys and Girls(含答案)
- 2024陶瓷工藝創(chuàng)新研發(fā)項(xiàng)目合作協(xié)議3篇
- 2025年度LED芯片研發(fā)與采購合作協(xié)議3篇
- 保險(xiǎn)產(chǎn)品創(chuàng)新與市場定位培訓(xùn)課件
- 2022-2023學(xué)年山東省淄博四中高二(上)期末數(shù)學(xué)試卷含答案
- 《建筑賦比興》一些筆記和摘錄(上)
- (完整文本版)體檢報(bào)告單模版
- 時間管理的原則與方法
- 【A公司人力資源招聘管理問題及優(yōu)化建議分析13000字(論文)】
- 鋼結(jié)構(gòu)牛腿計(jì)算
- 泌尿外科內(nèi)鏡診療技術(shù)質(zhì)量保障措施及應(yīng)急預(yù)案
- 華北電力大學(xué)(保定)
- Unity3D游戲開發(fā)PPT完整全套教學(xué)課件
- 腎內(nèi)科學(xué)篇病例分析1
評論
0/150
提交評論