程序設(shè)計(jì)與算法基礎(chǔ)課件6_第1頁
程序設(shè)計(jì)與算法基礎(chǔ)課件6_第2頁
程序設(shè)計(jì)與算法基礎(chǔ)課件6_第3頁
程序設(shè)計(jì)與算法基礎(chǔ)課件6_第4頁
程序設(shè)計(jì)與算法基礎(chǔ)課件6_第5頁
已閱讀5頁,還剩167頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

程序設(shè)計(jì)與算法基礎(chǔ)(6)潘愛民2006/10/30程序設(shè)計(jì)與算法基礎(chǔ)(6)潘愛民1OutlineHashtablesBloomfilterInvertedindexOutlineHashtables2SearchingproblemagainForalinkedlist,->O(n)Forasortedarray,->O(logn)CanweexpectO(1)?whichmeansRegardlessofthenumberofelementsbeingsearched,theruntimeisalwaysthesameGivenakey,thepositioninthetablecanbeaccesseddirectlySearchingproblemagainForal3OperationsonhashtablesCollectionofpairs(key,element),herekeymaybeastring,number,record,etc.PairshavedifferentkeysOperationsGet(theKey)Delete(theKey)Insert(theKey,theElement)OperationsonhashtablesColle4IdealHashingUsesa1Darray(ortable)table[0…m-1].EachpositionofthisarrayisabucketAbucketcannormallyholdonlyonepairUsesahashfunctionhthatconvertseachkeykintoanindexintherange[0,m-1]h(k)isthehomebucketforkeykEverypair(key,element)isstoredinitshomebuckettable[h[key]]IdealHashingUsesa1Darray(5[0][1][2][3][4][5][6][7]IdealHashingExamplePairsare:(22,a),

(33,c),

(3,d),

(73,e),

(85,f)Hashtableis

table[0…7],m=8Hashfunctionis

h(key)=key/11

Pairsarestoredintableasbelow(85,f)(22,a)(33,c)(3,d)(73,e)

Get,

Insert,and

Deletetake

O(1)

time[0][1][2][3][4][5][6][7]Ideal6WhatCanGoWrong?Wheredoes(26,g)go?Keysthathavethesamehomebucketaresynonyms22and26aresynonymswithrespecttothehashfunctionthatisinuse.Thehomebucketfor(26,g)isalreadyoccupied.[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)WhatCanGoWrong?Wheredoes(7WhatCanGoWrong?AcollisionoccurswhenthehomebucketforanewpairisoccupiedbyapairwithadifferentkeyAnoverflowoccurswhenthereisnospaceinthehomebucketforthenewpairWhenabucketcanholdonlyonepair,collisionsandoverflowsoccurtogetherNeedamethodtohandleoverflows(85,f)(22,a)(33,c)(3,d)(73,e)WhatCanGoWrong?Acollision8HashTableIssuesChoiceofhashfunctionsCollisionresolutionSizeofhashtableThesizeofbucket&numberofbucketsHashTableIssuesChoiceofhas9HashfunctionsDefinition:AhashfunctiontransformsakeyKintoanindexinthetableusedforstoringitemsofthesametypeasKIfhashfunctionhtransformsdifferentkeysintodifferentnumbers,itiscalledaperfecthashfunction

Therefore,ahashfunctionisamappingfromnitemstompositionsTotalnumberofpossiblemappingsismnIfn

m,thenumberofperfecthashfunctionsism!/(m-n)!HashfunctionsDefinition:10HashFunctionsTwopartsConvertakeyintoanintegerincasethekeyisnotanintegerh(k)Mapanintegerintoahomebucketf(k)isanintegerintherange[0,m-1],wheremisthenumberofbucketsinthetableHashFunctionsTwoparts11StringToNon-negativeIntegerEachcharacteris1bytelongAnintis4bytesAtwo-characterstringsmaybeconvertedintoaunique4bytenon-negativeintusingthecode:intanswer=s.at(0);answer=(answer<<8)+s.at(1);Stringsthatarelongerthan3charactersdonothaveauniquenon-negativeintrepresentationStringToNon-negativeInteger12StringToNonnegativeIntegerunsignedlongoperator()(conststringtheKey){ //ConverttheKeytoanonnegativeinteger.unsignedlonghashValue=0;intlength=(int)theKey.length();for(inti=0;i<length;i++)hashValue=5*hashValue+theKey.at(i);returnhashValue;}StringToNonnegativeIntegeru13MapintoahomebucketMostcommonmethodisbydivision

homeBucket=h(theKey)%divisor;divisor

equalsthenumberofbuckets

m0

homeBucket<divisor=m[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)MapintoahomebucketMostcom14UniformHashFunctionLetkeySpacebethesetofallpossiblekeysAuniformhashfunction

mapsthekeysinkeySpaceintobucketssuchthatapproximatelythesamenumberofkeysgetmappedintoeachbucket[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)UniformHashFunctionLetkeySp15UniformHashFunctionEquivalently,theprobabilitythatarandomlyselectedkeyhasbucketiasitshomebucketis1/m,0

i<mAuniformhashfunctionminimizesthelikelihoodofanoverflowwhenkeysareselectedatrandom[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)UniformHashFunctionEquivale16HashingByDivisionkeySpace=allintsForeverym,thenumberofintsthatgetmapped(hashed)intobucketiisapproximately232/mTherefore,thedivisionmethodresultsinauniformhashfunctionwhenkeySpace=allintsInpractice,keystendtobecorrelatedSo,thechoiceofthedivisormaffectsthedistributionofhomebucketsHashingByDivisionkeySpace=17SelectingTheDivisorBecauseofthiscorrelation,applicationstendtohaveabiastowardskeysthatmapintooddintegers(orintoevenones)Whenthedivisorisanevennumber,oddintegershashintooddhomebucketsandevenintegersintoevenhomebuckets20%14=6,30%14=2,8%14=815%14=1,3%14=3,23%14=9ThebiasinthekeysresultsinabiastowardeithertheoddorevenhomebucketsSelectingTheDivisorBecauseo18SelectingTheDivisorWhenthedivisorisanoddnumber,odd(even)integersmayhashintoanyhome20%15=5,30%15=0,8%15=815%15=0,3%15=3,23%15=8ThebiasinthekeysdoesnotresultinabiastowardeithertheoddorevenhomebucketsBetterchanceofuniformlydistributedhomebucketsSodonotuseanevendivisorSelectingTheDivisorWhenthe19SelectingTheDivisorSimilarbiaseddistributionofhomebucketsisseen,inpractice,whenthedivisorisamultipleofprimenumberssuchas3,5,7,…TheeffectofeachprimedivisorpofmdecreasesaspgetslargerIdeally,choosemsothatitisaprimenumberAlternatively,choosemsothatithasnoprimefactorsmallerthan

20SelectingTheDivisorSimilarb20HashingbyfoldingThekeyisdividedintoseveralparts,andthesepartsarecombinedorfoldedtogetherShiftfoldingUsingasimpleoperationsuchasadditiontocombinetheminacertainwayForexample,thesocialsecuritynumber(SSN),123-45-6789(123+456+789)%m (m=1000)BoundaryfoldingThepiecesofthekeysarefoldedonthebordersbetweendifferentparts,forexample(123+654+789)%m (m=1000)HashingbyfoldingThekeyisd21AboutfoldingSimpleandfast,bitpatterncanbeusedinsteadofnumericalvaluesInthecaseofstringsXor’ingthecharacterstogether,andusingtheresultfortheaddressXor’ingthechunksofstringsratherthansinglecharacters.ThelengthofachunkisequaltothenumberofbytesinanintegerTypically,theresultoffoldingprocessingaredividedmodulomAboutfoldingSimpleandfast,22HashingbyaMid-squarefunctionThekeyissquaredandthemiddleormidpartoftheresultisusedasthehashvalueEntirekeyparticipatesingeneratingthehashvalue,soabetterchancethatthedifferentvaluesaregeneratedfordifferentkeysInpractice,itismoreefficienttochooseapowerof2forthesizeofthetable,m,andextractthemidpartofthebitrepresentationofthesquareofakey,justusingamaskandashiftoperationForexample,31212=100101001010000101100001,

themidpart101000010=322HashingbyaMid-squarefuncti23HashingbyextractionOnlyapartofthekeyisusedtocomputetheaddressIfthepartisdistributeduniformly,itcanbesufficientforhashing(providedtheomittedportiondistinguishesthekeysonlyinaninsignificantway)Forexample,inthestudentID,somedigitsaresafelyomittedinahashfunctionISBNcodeissimilarHashingbyextractionOnlyapa24HashingbyradixtransformationThekeyistransformedintoanothernumberbaseForexample,Kisthedecimalnumber345,thenitsvalueinbase9is423(9)Thenitisdividedmodulominoriginalbase,theresultingnumberisusedasthehashvalueIfm=100,345(10)and245(10)arenothashedtothesamevalue,but345(10)and264(10)willbehashedtoasamevalue,23Hashingbyradixtransformatio25HashingbycryptographichashalgorithmsStronghashalgorithmsAchangeofanybitintheinputwillcausethechangeofanybitintheoutputwith0.5probabilityapproximatelyTheencryptionalgorithmhassimilarfeatureUsingcryptographicalgorithmstohashakeyForexample,computetheMD5(key)orSHA1(key)

orencryptthekeywithafixedencryptionkey,DESk(key)ThendividedmodulomHashingbycryptographichash26CollisionresolutionAvoidcollisionIncreasingthetablesizemayleadtobetterhashing,butsometimesnotThetwofactors–hashfunctionandtablesize–mayminimizethenumberofcollisions,buttheycannotcompletelyeliminatethem

(ifthesizeofkeyspaceislargerthanthetablesize)SomestrategiesOpenaddressingChainingBucketaddressingCollisionresolutionAvoidcoll27CollisionresolutionbyopeningaddressingAcollisionoccurswhenthehomebucketforanewpair(key,element)isoccupiedWemayhandlecollisionsby:Searchthehashtableinsomesystematicfashionforabucketthatisnotoccupied.LinearprobingQuadraticprobingRandomprobingEliminateoverflowsbypermittingeachbuckettokeepalistofallpairsforwhichitisthehomebucketDynamicarraylinkedlistCollisionresolutionbyopenin28OpeningaddressingIfthepositionh(K)isoccupied,thenthepositionsintheprobingsequence

norm(h(K)+p(1)),norm(h(K)+p(2)),…,norm(h(K)+p(m))

aretrieduntileitheranavailablecellisfoundorthesamepositionsaretriedrepeatedlyorthetableisfullIntheprobingsequence,Functionpisprobingfunctioniisaprobenormisanormalizationfunction,(divisionmodulotothesizeofthetable)OpeningaddressingIftheposit29LinearprobingIntheopenaddressingp(i)=c*i,

soifc=1,thenthepositiontobetriedis(h(K)+i)Inlinearprobing,Insertion:thepositiontostoreakeyisfoundbysequentiallysearchingallpositionsstartingfromthepositioncalculatedbythehashfunctionuntilanemptycellisfoundTendencytocreateclustersinthetable,theemptycellsfollowingclustershaveamuchgreaterchancetobefilledthanotherpositionsLinearprobingIntheopenaddr30LinearProbing–GetAndInsertdivisor=m

(numberofbuckets)=17Homebucket=key%170481216Insertpairswhosekeysare6,12,34,29,28,11,23,7,0,33,30,45612293428112370333045LinearProbing–GetAndInser31QuadraticprobingIntheopenaddressingIngeneral,p(i)=c2*i2+c1*i,forexample,

p(i)=(-1)i-1((i+1)/2)2,sothepositiontobetriedis

h(K),h(K)+1,h(K)-1,h(K)+4,h(K)-4,…,

h(K)+(m-1)2/4,h(K)–(m-1)2/4Theexperiencevalueofm(tablesize)isaprimeintheformof4j+1Question:Whynotusep(i)=i2,0i<m

(i2-j2)=(i+j)(i-j)TheeffectofclusteringisbetterthanlinearprobingSecondaryclusterforthekeyshashedtothesamelocationQuadraticprobingIntheopena32RandomprobingpfunctionisdefinedasapseudorandomnumbergeneratorArequirementInordertofindtheprobingsequence,therandomnumbergeneratorshouldbeinitializedtothesameseedforthesamekeyAvoidsecondaryclustersThesameprobingsequenceforakeyDifferentprobingsequencesfordifferentkeyswiththesamehashvalueRandomprobingpfunctionisde33Doublehashingpfunctionisdefinedwithanotherhashfunction,hp(K)

p(i)=i*hp(K)Theprobingsequenceish(K),h(K)+hp(K),h(K)+2hp(K),h(K)+3hp(K),…,

h(K)+(m-1)hp(K)Theprobingsequencewilldependonthechoiceofhashfunctionhp(K)Thehashfunctionhp(K)

canbedefinedwiththeoriginalhashfunctionh(K),forexample,

hp(K)=i*h(K)+1,thentheprobingsequenceis

h(K)+i*(i*h(K)+1)=(i2+1)h(K)+iSecondaryclustersDoublehashingpfunctionisde34PerformanceofopeningaddressingSuccessfulsearchesvs.unsuccessfulsearches

[TheArtofComputerProgramming,Volume3]LinearprobingQuadraticsearchDoublehashingSuccessfulsearches[1+1/(1-LF)]/21-ln(1-LF)-LF/2-ln(1-LF)/LFUnsuccessfulsearches[1+1/(1-LF)2]/21/(1-LF)-LF-ln(1-LF)1/(1-LF)LF=n/m,loadfactor=(numberofelementsinthetable)/tablesizePerformanceofopeningaddress35PerformanceOfLinearProbingWorst-caseGet/InserttimeisQ(n),wherenisthenumberofkeysinthetableThishappenswhenallpairsareinthesameclusterExpectedPerformanceSn=expectednumberofbucketsexaminedinasuccessfulsearchUn=expectednumberofbucketsexaminedinaunsuccessfulsearch0481216612293428112370333045PerformanceOfLinearProbingW36ExpectedPerformanceSn~[1+1/(1-LF)]/2Un~[1+1/(1-LF)2]/2Notethat0LF1LF

0.65isrecommended.LFSnUn0.51.52.50.651.94.60.752.58.50.905.550.5ExpectedPerformanceSn~[1+137LinearProbing–DeleteDelete(0)0481216612293428112370333045048121661229342811237453330Searchclusterforpair(ifany)tofillvacatedbucket048121661229342811237453330LinearProbing–DeleteDelete(38LinearProbing–Delete(34)Searchclusterforpair(ifany)tofillvacatedbucket0481216612293428112370333045048121661229028112373330450481216612290281123733304504812166122928112370333045LinearProbing–Delete(34)Sea39LinearProbing–Delete(29)Searchclusterforpair(ifany)tofillvacatedbucket048121661229342811237033304504812166123428112370333045048121661211342823703330450481216612113428237033304504812166121134282370333045LinearProbing–Delete(29)Sea40HashTableDesignPerformancerequirementsaregiven,determinemaximumpermissibleloadfactorWewantasuccessfulsearchtomakenomorethan10compares(expected)Sn~?(1+1/(1–LF))LF18/19Wewantanunsuccessfulsearchtomakenomorethan13compares(expected)Un~?(1+1/(1–LF)2)LF4/5SoLFmin{18/19,4/5}=4/5HashTableDesignPerformancer41HashTableDesignDynamicresizingoftableWheneverloadfactorexceedsthreshold(4/5inourexample),rehashintoatableofapproximatelytwicethecurrentsizeFixedtablesizeKnowmaximumnumberofpairsNomorethan1000pairsLF

4/5=>m

5/4*1000=1250.Pick

m(equaltodivisor)tobeaprimenumberoranoddnumberwithnoprimedivisorssmallerthan20HashTableDesignDynamicresiz42Collisionresolution:chainingEachbucketkeepsalinearlistofallpairsforwhichitisthehomebucketNeveroverflowifthecapacityofthelinearlistisunlimitedThelinearlistmayormaynotbesortedbykey.ThelinearlistmaybeanarraylinearlistorachainPerformanceIncreasingthelengthofthelistscandegraderetrievalperformanceRequireadditionalspaceforpointersCollisionresolution:chaining43Putinpairswhosekeysare6,12,34,29,28,11,23,7,0,33,30,45Homebucket=key%17.SortedChains[0][4][8][12][16]126342928112370333045Putinpairswhosekeysare6,44ExpectedPerformanceNotethatLF0.ExpectedchainlengthisLF.Sn~1+LF/2(inthecaseofunsortedlist)Un~LFExpectedPerformanceNotethat45Coalescedhashing(coalescedchaining)Combinelinearprobingandchainingh(K)=K%17

Insertpairswhosekeysare6,12,34,29,28,11,23,7,0Alternatively,thecollidingkeycanbeputinanoverflowarea(calledacellar)0481216612293428112370Coalescedhashing(coalescedc46BucketaddressingAssociateabucketwitheachaddress,andabucketisablockofspaceenoughtostoremultipleitemsCollisionisnottotallyavoided,ifabucketisfull,thenAnewitemhashedtothebucketcanbestoredinthenextbucket,justasdoesintheopenaddressingapproachThenewitemcanalsobestoredinanoverflowarea.ThebucketwillbemarkedwithaflagindicatingithasadditionalitemstobesearchedBucketaddressingAssociateab47PerfecthashfunctionsPerfecthashfunctionsoridealhashfunctionsIthashesakeytoitsproperposition,andnocollisionsoccurAnassumptionisthatthekeysetisknownIfthenumberofcellsinthetableisequaltothenumberofdataitems,itiscalledaminimalperfecthashfunction sonospaceiswastedExamplesReservedwordsusedbyassemblers,compilers,filesonunerasableopticaldisks,dictionaries,…ItisnoteasytoobtainaperfecthashfunctionPerfecthashfunctionsPerfect48Cichelli’smethodtoconstructaminimalperfecthashfunctionDevelopedbyRichardJ.CichelliUsedtohasharelativelysmallnumberofreservedwordsThefunctionisoftheform

h(word)=(length(word)+g(firstletter(word))

+g(lastletter(word)))modm

wheregisthefunctiontobeconstructedCichelli’smethodtoconstruct49Cichelli’salgorithmChooseavalueforMaxComputethenumberofoccurrencesofeachfirstandlastletterinthesetofallwordsOrderallwordsinaccordancetothefrequencyofoccurrenceofthefirstandthelastlettersExamples:Calliope,Clio,Erato,Euterpe,Melpomene,Polyhymnia,Terpsichore,Thalia,Urania=>E(6),A(3),C(2),O(2),T(2),M(1),P(1),U(1)Euterpe,Calliope,Erato,Terpsichore,Melpomene,Thalia,Clio,Polyhymnia,UraniaCichelli’salgorithmChooseav50Cichelli’salgorithm(cont.)Search(wordList) ifwordListisempty halt;

word=firstwordfromwordList;

wordList=wordListwiththefirstworddetached; ifthefirstandthelastlettersofwordareassignedg-values try(word,-1,-1)//-1signifies‘valuealreadyassigned’ ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue; elseifneigherthefirstnorthelastlettershasag-value foreachn,min{0,…,Max) try(word,n,m); ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue; elseifeitherthefirstorthelastletterhasag-value foreachnin{0,…,Max) try(word,-1,n); ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue;Cichelli’salgorithm(cont.)Se51Cichelli’salgorithm(cont.)try(word,firstLetterValue,lastLetterValue) ifh(word)hasnotbeenclaimed reserveh(word); ifnot-1(i.e.,notreserved) assignfirstLetterValueand/orlastLetterValue

asg-valuesof firstletter(word)and/orlastletter(word) returnsuccess returnfailureCichelli’salgorithm(cont.)tr52Ainvocationsofthesearchingprocedure reservedhashvaluesEuterpeE=0h=7 {7}CalliopeC=0h=8 {78} EratoO=0h=5 {578} TerpsichoreT=0 h=2 {2578} Melpomene M=0 h=0 {02578} Thalia A=0 h=6 {025678} Clio h=4 {0245678} Polyhymnia P=0 h=1 {01245678} Urania U=0 h=6* {01245678} Urania U=1 h=7* {01245678} Urania U=2 h=8* {01245678} Urania U=3 h=0* {01245678} Urania U=4 h=1* {01245678} Polyhymnia P=1 h=2* {0245678} Polyhymnia P=2 h=3 {02345678} Urania U=0 h=6* {02345678} Urania U=1 h=7* {02345678} Urania U=2 h=8* {02345678} Urania U=3 h=0* {02345678} Urania U=4 h=1 {012345678}Ainvocationsofthesearching53Aboutcichelli’salgorithmAbrute-forcesearchforg-function,sothesearchprocessisexponentialItisnotapplicabletoalargenumberofwordsItdoesnotguaranteethataperfecthashfunctioncanbefoundExtensions(1)Thesecondtolastlettersinthewordareinvolvedinthehashfunctions(2)h(word)=length(word)+g1(firstletter(word))+…+glength(word)(lastletter(word))(3)h(word)=bucketgr(word)+hgr(word)(word)Aboutcichelli’salgorithmAbr54FHCDalgorithmSearchforaminimalperfecthashfunctionoftheform

h(word)=h0(word)+g(h1(word))+g(h2(word))

h0:keyspace->[0,m)

h1:keyspace->[0,r)

h2:keyspace->[r,2r)Steps:Mapping(dependencygraph):betweenh1toh2foreachwordOrdering:firstdeterminethenodewithmorelinksSearching:assignhashvaluestokeysbychoosingappropriateg-valueforeachvertex

Foreachvertex,eitherg(h1(word))org(h2(word))isknownFHCDalgorithmSearchforamin55Extensiblehashing0001101100011

001100010110011

110111110001100

010112221b00b01b1h(k)=11001h(k)=000010001101100011

00110001011001101100

010112222b00b01b101101111100110012b1100000101001100011

000011001101100

010113322b000b01b101101111100110012b1110010111011100110

001013b001Extensiblehashing00011011000156AnexampleofhashfunctionIndistributedenvironments,howtopartitionalargenumberofjobs(orrecords)intoaclusterofmachineskeyv=h(key)AnexampleofhashfunctionIn57LinearhashingNoindexisneededThebucketissplitifnecessaryAteachlevelofsplitting,linearhashingmaintaintwohashfunctions,hlevelandhlevel+1,suchthat

hlevel=Kmode(TSize*2level)0001

001000110101

01111000

100110111100

1101LinearhashingNoindexisneed58STLhash_mapSimplyusesadivisorthatisanoddnumberThissimplifiesimplementationbecausewemustbeabletoresizethehashtableasmorepairsareputintothedictionaryArraydoubling,forexample,requiresyoutogofroma1Darraytablewhoselengthism(whichisodd)toanarraywhoselengthis2m+1(whichisalsoodd)STLhash_mapSimplyusesadivi59AbouthashtablesPerfecthashfunctions(n

m)Uniformhashfunctions(nm)PerformancerequirementsHowtodealwithcollisionsHowtodealwithoverflowsifthesizeofbucketsisfixedAbouthashtablesPerfecthash60BloomFilters–anintroductionDifferentialFilesSimplelargedatabaseFileofrecordsresidingondiskSinglekeyIndextorecordsOperationsRetrieveUpdateInsertanewrecordMakechangestoanexistingrecordDeletearecordBloomFilters–anintroductio61Na?veModeOfOperationProblemsIndexandFilechangewithtimeRecovery=>CopyMasterFile(MF)frombackupCopyMasterIndex(MI)frombackupProcessalltransactionssincelastbackupRecoverytimedependsonMF&MIsize+#transactionssincelastbackupKeyIndexFileNa?veModeOfOperationProblem62DifferentialFileMakenochangestomasterfileAlterindexandwriteupdatedrecordtoanewfilecalleddifferentialfileKeyIndexFileDFAdvantageDFissmallerthanFileandsomaybebackedupmorefrequentlyIndexneedstobebackedupwheneverDFis.So,indexshouldbenolargerthanDFRecoverytimeisreducedDifferentialFileMakenochang63DifferentialFileOperationKeyIndexFileDFDisadvantageEventuallyDFbecomeslargeandcannolongerbebackedupwithdesiredfrequencyMustintegrateFileandDFnowFollowingintegration,DFisemptyDifferentialFileOperationKey64DifferentialFileOperationKeyIndexFileDFLargeIndexIndexcannotbebackedupasfrequentlyasdesiredTimetorecovercurrentstateofindex&DFisexcessiveUseadifferentialindexMakenochangestoIndexDIisanindextoalldeletedrecordsandupdatedrecordsinDFDifferentialFileOperationKey65DifferentialFile&IndexOperationPerformancehitMostqueriessearchbothDIandIndexIncreasein#ofdiskaccesses/queryUseafiltertodecidewhetherornotDIshouldbesearchedKeyIndexFileDFDIYNYDifferentialFile&IndexOper66IdealFilterKeyIndexFileDFFilterYNYDIYY

=>thiskeyisintheDIN

=>thiskeyisnotintheDIFunctionalityofidealfilterissameasthatofDISo,afilterthateliminatesperformancehitofDIdoesn’texistIdealFilterKeyIndexFileDFFilt67BloomFilter(BF)N

=>thiskeyisnotintheDIM(maybe)

=>thiskeymaybeintheDIFiltererrorBFsaysMaybeDIsaysNoKeyIndexFileDFBFYNYDIMNBloomFilter(BF)N=>thiskey68BloomFilter(BF)FiltererrorBFsaysMaybeDIsaysNoKeyIndexFileDFBFYNYDIMNBFresidesinmemoryPerformancehitpaidonlywhenthereisafiltererrorBloomFilter(BF)FiltererrorK69BloomFilterByBurtonH.Bloomin1970Bloomfilterisaspace-efficientprobabilisticdatastructurethatisusedtotestwhetheranelementisamemberofasetFalsepositivesarepossibleFalsenegativearenotItissuitableforthesetthatsupportaddition/queryoperationsRemovinganelementisnoteasy(consideringacountingfilter)See/wiki/Bloom_filterBloomFilterByBurtonH.Bloom70BloomFilterDesignUsembitsofmemoryfortheBFLargerm=>fewerfiltererrorsWhenBFempty,allm

bits

=0Usek>0hashfunctions:f1(),f2(),…,fk()WhenkeyKeyinsertedintoBF,setbitsf1(Key),f2(Key),…,andfk(Key)to1f1(Key),f2(key),…,fk(Key)isthesignatureofkeyKeyBloomFilterDesignUsembits71Examplem=11(normally,mwouldbemuchmuchlarger)k=2(2hashfunctions)f1(Key)

=Keymodmf2(Key)

=(2Key)modmKey=15Key=17000000000001234567890111110Examplem=11(normally,mwou72ExampleBFhasKey=15andKey=17Searchfor

Keyf1(Key)

=0or

f2(Key)

=0=>

Key

notinBFf1(Key)

=1and

f2(Key)

=1=>

Key

maybeinBFKey=6

=>filtererror000000000001234567890111110ExampleBFhasKey=15andKey73BloomFilterDesignChoosem(thefiltersizeinbits)UseasmuchmemoryasisavailablePickk(thenumberofhashfunctions)ktoosmall=>probabilityofdifferentkeyshavingsamesignatureishigh.ktoolarge=>filterbecomesfilledwith1stoosoonSelectthekhashfunctionsHashfunctionsshouldberelativelyindependentBloomFilterDesignChoosem(t74TheparametersofourbloomfilterProbabilityofafiltererrordependson:Filtersize…m#ofhashfunctions…k#ofelementsinsertedbeforefilterisresetto0…nAssumethatmandnare

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論