云計(jì)算與云數(shù)據(jù)管理_第1頁
云計(jì)算與云數(shù)據(jù)管理_第2頁
云計(jì)算與云數(shù)據(jù)管理_第3頁
云計(jì)算與云數(shù)據(jù)管理_第4頁
云計(jì)算與云數(shù)據(jù)管理_第5頁
已閱讀5頁,還剩162頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

云計(jì)算與云數(shù)據(jù)管理陸嘉恒中國人民大學(xué)《先進(jìn)數(shù)據(jù)管理》前沿講習(xí)班2024/9/201主要內(nèi)容2

云計(jì)算概述Google云計(jì)算技術(shù):GFS,Bigtable和MapreduceYahoo云計(jì)算技術(shù)和Hadoop云數(shù)據(jù)管理的挑戰(zhàn)2024/9/202人民大學(xué)新開的《分布式系統(tǒng)與云計(jì)算》課程3

分布式系統(tǒng)概述分布式云計(jì)算技術(shù)綜述分布式云計(jì)算平臺分布式云計(jì)算程序開發(fā)2024/9/203第一篇分布式系統(tǒng)概述4第一章:分布式系統(tǒng)入門第二章:客戶-服務(wù)器端構(gòu)架第三章:分布式對象第四章:公共對象請求代理結(jié)構(gòu)(CORBA)2024/9/204第二篇云計(jì)算綜述5第五章:云計(jì)算入門

第六章:云服務(wù)第七章:云相關(guān)技術(shù)比較7.1網(wǎng)格計(jì)算和云計(jì)算7.2Utility計(jì)算(效用計(jì)算)和云計(jì)算7.3并行和分布計(jì)算和云計(jì)算7.4集群計(jì)算和云計(jì)算

2024/9/205第三篇云計(jì)算平臺6第八章:Google云平臺的三大技術(shù)第九章:Yahoo云平臺的技術(shù)第十章:Aneka云平臺的技術(shù)第十一章:Greenplum云平臺的技術(shù)第十二章:Amazondynamo云平臺的技術(shù)2024/9/206第四篇云計(jì)算平臺開發(fā)7第十三章:基于Hadoop系統(tǒng)開發(fā)第十四章:基于HBase系統(tǒng)開發(fā)第十五章:基于GoogleApps系統(tǒng)開發(fā)第十六章:基于MSAzure系統(tǒng)開發(fā)第十七章:基于AmazonEC2系統(tǒng)開發(fā)2024/9/207Cloudcomputing2024/9/2082024/9/209Whyweusecloudcomputing?2024/9/2010Whyweusecloudcomputing?Case1:WriteafileSaveComputerdown,fileislostFilesarealwaysstoredincloud,neverlost2024/9/2011Whyweusecloudcomputing?Case2:UseIEdownload,install,useUseQQdownload,install,useUseC++download,install,use……Gettheservefromthecloud2024/9/2012Whatiscloudandcloudcomputing?CloudDemandresourcesorservicesoverInternetscaleandreliabilityofadatacenter.2024/9/2013Whatiscloudandcloudcomputing?

CloudcomputingisastyleofcomputinginwhichdynamicallyscalableandoftenvirtualizedresourcesareprovidedasaserveovertheInternet.Usersneednothaveknowledgeof,expertisein,orcontroloverthetechnologyinfrastructureinthe"cloud"thatsupportsthem.

2024/9/2014CharacteristicsofcloudcomputingVirtual.software,databases,Webservers,operatingsystems,storageandnetworkingasvirtualservers.Ondemand.addandsubtractprocessors,memory,networkbandwidth,storage.2024/9/2015IaaSInfrastructureasaServicePaaSPlatformasaServiceSaaSSoftwareasaServiceTypesofcloudservice2024/9/2016SoftwaredeliverymodelNohardwareorsoftwaretomanageServicedeliveredthroughabrowserCustomersusetheserviceondemandInstantScalabilitySaaS2024/9/2017ExamplesYourcurrentCRMpackageisnotmanagingtheloadoryousimplydon’twanttohostitin-house.UseaSaaSprovidersuchasS

Youremailishostedonanexchangeserverinyourofficeanditisveryslow.OutsourcethisusingHostedExchange.SaaS2024/9/2018PlatformdeliverymodelPlatformsarebuiltuponInfrastructure,whichisexpensiveEstimatingdemandisnotascience!Platformmanagementisnotfun!PaaS2024/9/2019ExamplesYouneedtohostalargefile(5Mb)onyourwebsiteandmakeitavailablefor35,000usersforonlytwomonthsduration.UseCloudFrontfromAmazon.Youwanttostartstorageservicesonyournetworkforalargenumberoffilesandyoudonothavethestoragecapacity…useAmazonS3.PaaS2024/9/2020ComputerinfrastructuredeliverymodelAplatformvirtualizationenvironmentComputingresources,suchasstoringandprocessingcapacity.

VirtualizationtakenastepfurtherIaaS2024/9/2021ExamplesYouwanttorunabatchjobbutyoudon’thavetheinfrastructurenecessarytorunitinatimelymanner.UseAmazonEC2.

Youwanttohostawebsite,butonlyforafewdays.UseFlexiscale.IaaS2024/9/2022Cloudcomputingandothercomputingtechniques2024/9/2023The21stCenturyVisionOfComputingLeonardKleinrock,oneofthechiefscientistsoftheoriginalAdvancedResearchProjectsAgencyNetwork(ARPANET)projectwhichseededtheInternet,said:“Asofnow,computernetworksarestillintheirinfancy,butastheygrowupandbecomesophisticated,wewillprobablyseethespreadof‘computerutilities’which,likepresentelectricandtelephoneutilities,willserviceindividualhomesandofficesacrossthecountry.”2024/9/2024The21stCenturyVisionOfComputingSunMicrosystemsco-founderBillJoyHealsoindicated“Itwouldtaketimeuntilthesemarketstomaturetogeneratethiskindofvalue.Predictingnowwhichcompanieswillcapturethevalueisimpossible.Manyofthemhavenotevenbeencreatedyet.”2024/9/2025The21stCenturyVisionOfComputing2024/9/2026DefinitionsCloudGridClusterutility2024/9/2027DefinitionsCloudGridClusterutilityUtilitycomputingisthepackagingofcomputingresources,suchascomputationandstorage,asameteredservicesimilartoatraditionalpublicutility2024/9/2028DefinitionsCloudGridClusterutilityAcomputerclusterisagroupoflinkedcomputers,workingtogethercloselysothatinmanyrespectstheyformasinglecomputer.2024/9/2029DefinitionsCloudGridClusterutilityGridcomputingistheapplicationofseveralcomputerstoasingleproblematthesametime—usuallytoascientificortechnicalproblemthatrequiresagreatnumberofcomputerprocessingcyclesoraccesstolargeamountsofdata2024/9/2030DefinitionsCloudGridClusterutilityCloudcomputingisastyleofcomputinginwhichdynamicallyscalableandoftenvirtualizedresourcesareprovidedasaserviceovertheInternet.2024/9/2031GridComputing&CloudComputingsharealotcommonalityintention,architectureandtechnology

Differenceprogrammingmodel,businessmodel,computemodel,applications,andVirtualization.2024/9/2032GridComputing&CloudComputingtheproblemsaremostlythesamemanagelargefacilities;definemethodsbywhichconsumersdiscover,requestanduseresourcesprovidedbythecentralfacilities;implementtheoftenhighlyparallelcomputationsthatexecuteonthoseresources.2024/9/2033GridComputing&CloudComputingVirtualizationGriddonotrelyonvirtualizationasmuchasCloudsdo,eachindividualorganizationmaintainfullcontroloftheirresourcesCloudanindispensableingredientforalmosteveryCloud2024/9/20342024/9/20352024/9/2036Anyquestionandanycomments?2024/9/2036主要內(nèi)容37

云計(jì)算概述Google云計(jì)算技術(shù):GFS,Bigtable和MapreduceYahoo云計(jì)算技術(shù)和Hadoop云數(shù)據(jù)管理的挑戰(zhàn)2024/9/2037GoogleCloudcomputingtechniques2024/9/2038TheGoogleFileSystem 2024/9/2039TheGoogleFileSystem (GFS)AscalabledistributedfilesystemforlargedistributeddataintensiveapplicationsMultipleGFSclustersarecurrentlydeployed.Thelargestoneshave:1000+storagenodes300+TeraBytesofdiskstorageheavilyaccessedbyhundredsofclientsondistinctmachines2024/9/2040IntroductionSharesmanysamegoalsaspreviousdistributedfilesystemsperformance,scalability,reliability,etcGFSdesignhasbeendrivenbyfourkeyobservationofGoogleapplicationworkloadsandtechnologicalenvironment2024/9/2041Intro:Observations11.Componentfailuresarethenormconstantmonitoring,errordetection,faulttoleranceandautomaticrecoveryareintegraltothesystem2.Hugefiles(bytraditionalstandards)MultiGBfilesarecommonI/Ooperationsandblockssizesmustberevisited2024/9/2042Intro:Observations23.MostfilesaremutatedbyappendingnewdataThisisthefocusofperformanceoptimizationandatomicityguarantees4.Co-designingtheapplicationsandAPIsbenefitsoverallsystembyincreasingflexibility2024/9/2043TheDesignClusterconsistsofasinglemasterandmultiplechunkserversandisaccessedbymultipleclients2024/9/2044TheMasterMaintainsallfilesystemmetadata.namesspace,accesscontrolinfo,filetochunkmappings,chunk(includingreplicas)location,etc.PeriodicallycommunicateswithchunkserversinHeartBeatmessagestogiveinstructionsandcheckstate2024/9/2045TheMasterHelpsmakesophisticatedchunkplacementandreplicationdecision,usingglobalknowledgeForreadingandwriting,clientcontactsMastertogetchunklocations,thendealsdirectlywithchunkserversMasterisnotabottleneckforreads/writes2024/9/2046ChunkserversFilesarebrokenintochunks.Eachchunkhasaimmutablegloballyunique64-bitchunk-handle.handleisassignedbythemasteratchunkcreationChunksizeis64MBEachchunkisreplicatedon3(default)servers2024/9/2047ClientsLinkedtoappsusingthefilesystemAPI.CommunicateswithmasterandchunkserversforreadingandwritingMasterinteractionsonlyformetadataChunkserverinteractionsfordataOnlycachesmetadatainformationDataistoolargetocache.2024/9/2048ChunkLocationsMasterdoesnotkeepapersistentrecordoflocationsofchunksandreplicas.Pollschunkserversatstartup,andwhennewchunkserversjoin/leaveforthis.StaysuptodatebycontrollingplacementofnewchunksandthroughHeartBeatmessages(whenmonitoringchunkservers)2024/9/2049OperationLogRecordofallcriticalmetadatachangesStoredonMasterandreplicatedonothermachinesDefinesorderofconcurrentoperationsAlsousedtorecoverthefilesystemstate2024/9/2050SystemInteractions:

LeasesandMutationOrderLeasesmaintainamutationorderacrossallchunkreplicasMastergrantsaleasetoareplica,calledtheprimaryTheprimarychosestheserialmutationorder,andallreplicasfollowthisorderMinimizesmanagementoverheadfortheMaster2024/9/2051AtomicRecordAppendClientspecifiesthedatatowrite;GFSchoosesandreturnstheoffsetitwritestoandappendsthedatatoeachreplicaatleastonceHeavilyusedbyGoogle’sDistributedapplications.NoneedforadistributedlockmanagerGFSchosestheoffset,nottheclient2024/9/2052AtomicRecordAppend:How?FollowssimilarcontrolflowasmutationsPrimarytellssecondaryreplicastoappendatthesameoffsetastheprimaryIfareplicaappendfailsatanyreplica,itisretriedbytheclient.Soreplicasofthesamechunkmaycontaindifferentdata,includingduplicates,wholeorinpart,ofthesamerecord2024/9/2053AtomicRecordAppend:How?GFSdoesnotguaranteethatallreplicasarebitwiseidentical.Onlyguaranteesthatdataiswrittenatleastonceinanatomicunit.Datamustbewrittenatthesameoffsetforallchunkreplicasforsuccesstobereported.2024/9/2054DetectingStaleReplicasMasterhasachunkversionnumbertodistinguishuptodateandstalereplicasIncreaseversionwhengrantingaleaseIfareplicaisnotavailable,itsversionisnotincreasedmasterdetectsstalereplicaswhenachunkserversreportchunksandversionsRemovestalereplicasduringgarbagecollection2024/9/2055GarbagecollectionWhenaclientdeletesafile,masterlogsitlikeotherchangesandchangesfilenametoahiddenfile.Masterremovesfileshiddenforlongerthan3dayswhenscanningfilesystemnamespacemetadataisalsoerasedDuringHeartBeatmessages,thechunkserverssendthemasterasubsetofitschunks,andthemastertellsitwhichfileshavenometadata.Chunkserverremovesthesefilesonitsown2024/9/2056FaultTolerance:

HighAvailabilityFastrecoveryMasterandchunkserverscanrestartinsecondsChunkReplicationMasterReplication“shadow”mastersprovideread-onlyaccesswhenprimarymasterisdownmutationsnotdoneuntilrecordedonallmasterreplicas2024/9/2057FaultTolerance:

DataIntegrityChunkserversusechecksumstodetectcorruptdataSincereplicasarenotbitwiseidentical,chunkserversmaintaintheirownchecksumsForreads,chunkserververifieschecksumbeforesendingchunkUpdatechecksumsduringwrites2024/9/2058Introductionto

MapReduce2024/9/2059MapReduce:Insight

”Considertheproblemofcountingthenumberofoccurrencesofeachwordinalargecollectionofdocuments”Howwouldyoudoitinparallel?2024/9/2060MapReduceProgrammingModel

InspiredfrommapandreduceoperationscommonlyusedinfunctionalprogramminglanguageslikeLisp.Usersimplementinterfaceoftwoprimarymethods:1.Map:(key1,val1)→(key2,val2)2.Reduce:(key2,[val2])→[val3]

2024/9/2061Mapoperation

Map,apurefunction,writtenbytheuser,takesaninputkey/valuepairandproducesasetofintermediatekey/valuepairs.e.g.(doc—id,doc-content)DrawananalogytoSQL,mapcanbevisualizedasgroup-byclauseofanaggregatequery.

2024/9/2062Reduceoperation

Oncompletionofmapphase,alltheintermediatevaluesforagivenoutputkeyarecombinedtogetherintoalistandgiventoareducer.Canbevisualizedasaggregatefunction(e.g.,average)thatiscomputedoveralltherowswiththesamegroup-byattribute.2024/9/2063Pseudo-codemap(Stringinput_key,Stringinput_value)://input_key:documentname//input_value:documentcontentsforeachwordwininput_value: EmitIntermediate(w,"1");reduce(Stringoutput_key,Iteratorintermediate_values)://output_key:aword//output_values:alistofcountsintresult=0;foreachvinintermediate_values: result+=ParseInt(v);Emit(AsString(result));2024/9/2064MapReduce:Executionoverview

2024/9/2065MapReduce:Example

2024/9/2066MapReduceinParallel:Example

2024/9/2067MapReduce:FaultToleranceHandledviare-executionoftasks.TaskcompletioncommittedthroughmasterWhathappensifMapperfails?Re-executecompleted+in-progressmaptasksWhathappensifReducerfails?Re-executeinprogressreducetasksWhathappensifMasterfails?Potentialtrouble!!2024/9/2068MapReduce:

WalkthroughofOnemoreApplication2024/9/20692024/9/2070MapReduce:PageRank

PageRankmodelsthebehaviorofa“randomsurfer”.C(t)istheout-degreeoft,and(1-d)isadampingfactor(randomjump)The“randomsurfer”keepsclickingonsuccessivelinksatrandomnottakingcontentintoconsideration.Distributesitspagesrankequallyamongallpagesitlinksto.Thedampeningfactortakesthesurfer“gettingbored”andtypingarbitraryURL.2024/9/2071PageRank:KeyInsights

Effectsateachiterationislocal.i+1thiterationdependsonlyonithiterationAtiterationi,PageRankforindividualnodescanbecomputedindependently2024/9/2072PageRankusingMapReduce

UseSparsematrixrepresentation(M)MapeachrowofMtoalistofPageRank“credit”toassigntooutlinkneighbours.TheseprestigescoresarereducedtoasinglePageRankvalueforapagebyaggregatingoverthem.2024/9/2073PageRankusingMapReduceMap:distributePageRank“credit”tolinktargetsReduce:gatherupPageRank“credit”frommultiplesourcestocomputenewPageRankvalueIterateuntilconvergenceSourceofImage:Lin20082024/9/2074

Phase1:ProcessHTML

Maptasktakes(URL,content)pairsandmapsthemto(URL,(PRinit,list-of-urls))PRinitisthe“seed”PageRankforURLlist-of-urlscontainsallpagespointedtobyURLReducetaskisjusttheidentityfunction2024/9/2075

Phase2:PageRankDistribution

Reducetaskgets(URL,url_list)andmany(URL,val)valuesSumvalsandfixupwithdtogetnewPREmit(URL,(new_rank,url_list))Checkforconvergenceusingnonparallelcomponent2024/9/2076MapReduce:SomeMoreAppsDistributedGrep.CountofURLAccessFrequency.Clustering(K-means)GraphAlgorithms.IndexingSystemsMapReduceProgramsInGoogleSourceTree2024/9/2077MapReduce:Extensionsandsimilarapps

PIG(Yahoo)Hadoop(Apache)DryadLinq(Microsoft)2024/9/2078LargeScaleSystemsArchitectureusingMapReduceUserAppMapReduceDistributedFileSystems(GFS)2024/9/2079BigTable:ADistributedStorageSystemforStructuredData2024/9/2080IntroductionBigTableisadistributedstoragesystemformanagingstructureddata.DesignedtoscaletoaverylargesizePetabytesofdataacrossthousandsofserversUsedformanyGoogleprojectsWebindexing,PersonalizedSearch,GoogleEarth,GoogleAnalytics,GoogleFinance,…Flexible,high-performancesolutionforallofGoogle’sproducts2024/9/2081MotivationLotsof(semi-)structureddataatGoogleURLs:Contents,crawlmetadata,links,anchors,pagerank,…Per-userdata:Userpreferencesettings,recentqueries/searchresults,…Geographiclocations:Physicalentities(shops,restaurants,etc.),roads,satelliteimagedata,userannotations,…ScaleislargeBillionsofURLs,manyversions/page(~20K/version)Hundredsofmillionsofusers,thousandsorq/sec100TB+ofsatelliteimagedata2024/9/2082WhynotjustusecommercialDB?ScaleistoolargeformostcommercialdatabasesEvenifitweren’t,costwouldbeveryhighBuildinginternallymeanssystemcanbeappliedacrossmanyprojectsforlowincrementalcostLow-levelstorageoptimizationshelpperformancesignificantlyMuchhardertodowhenrunningontopofadatabaselayer2024/9/2083GoalsWantasynchronousprocessestobecontinuouslyupdatingdifferentpiecesofdataWantaccesstomostcurrentdataatanytimeNeedtosupport:Veryhighread/writerates(millionsofopspersecond)EfficientscansoverallorinterestingsubsetsofdataEfficientjoinsoflargeone-to-oneandone-to-manydatasetsOftenwanttoexaminedatachangesovertimeE.g.Contentsofawebpageovermultiplecrawls2024/9/2084BigTableDistributedmulti-levelmapFault-tolerant,persistentScalableThousandsofserversTerabytesofin-memorydataPetabyteofdisk-baseddataMillionsofreads/writespersecond,efficientscansSelf-managingServerscanbeadded/removeddynamicallyServersadjusttoloadimbalance2024/9/2085BuildingBlocksBuildingblocks:GoogleFileSystem(GFS):RawstorageScheduler:schedulesjobsontomachinesLockservice:distributedlockmanagerMapReduce:simplifiedlarge-scaledataprocessingBigTableusesofbuildingblocks:GFS:storespersistentdata(SSTablefileformatforstorageofdata)Scheduler:schedulesjobsinvolvedinBigTableservingLockservice:masterelection,locationbootstrappingMapReduce:oftenusedtoread/writeBigTabledata2024/9/2086BasicDataModelABigTableisasparse,distributedpersistentmulti-dimensionalsortedmap(row,column,timestamp)->cellcontentsGoodmatchformostGoogleapplications2024/9/2087WebTableExampleWanttokeepcopyofalargecollectionofwebpagesandrelatedinformationUseURLsasrowkeysVariousaspectsofwebpageascolumnnamesStorecontentsofwebpagesinthecontents:columnunderthetimestampswhentheywerefetched.2024/9/2088RowsNameisanarbitrarystringAccesstodatainarowisatomicRowcreationisimplicituponstoringdataRowsorderedlexicographicallyRowsclosetogetherlexicographicallyusuallyononeorasmallnumberofmachines2024/9/2089Rows(cont.)Readsofshortrowrangesareefficientandtypicallyrequirecommunicationwithasmallnumberofmachines.Canexploitthispropertybyselectingrowkeyssotheygetgoodlocalityfordataaccess.Example: ,,, VS edu.gatech.math,edu.gatech.phys,edu.uga.math,edu.uga.phys2024/9/2090ColumnsColumnshavetwo-levelnamestructure:family:optional_qualifierColumnfamilyUnitofaccesscontrolHasassociatedtypeinformationQualifiergivesunboundedcolumnsAdditionallevelsofindexing,ifdesired2024/9/2091TimestampsUsedtostoredifferentversionsofdatainacellNewwritesdefaulttocurrenttime,buttimestampsforwritescanalsobesetexplicitlybyclientsLookupoptions:“ReturnmostrecentKvalues”“Returnallvaluesintimestamprange(orallvalues)”Columnfamiliescanbemarkedw/attributes:“OnlyretainmostrecentKvaluesinacell”“KeepvaluesuntiltheyareolderthanKseconds”2024/9/2092Implementation–ThreeMajorComponentsLibrarylinkedintoeveryclientOnemasterserverResponsiblefor:AssigningtabletstotabletserversDetectingadditionandexpirationoftabletserversBalancingtablet-serverloadGarbagecollectionManytabletserversTabletservershandlereadandwriterequeststoitstableSplitstabletsthathavegrowntoolarge2024/9/2093Implementation(cont.)Clientdatadoesn’tmovethroughmasterserver.Clientscommunicatedirectlywithtabletserversforreadsandwrites.Mostclientsnevercommunicatewiththemasterserver,leavingitlightlyloadedinpractice.2024/9/2094TabletsLargetablesbrokenintotabletsatrowboundariesTabletholdscontiguousrangeofrowsClientscanoftenchooserowkeystoachievelocalityAimfor~100MBto200MBofdatapertabletServingmachineresponsiblefor~100tabletsFastrecovery:100machineseachpickup1tabletforfailedmachineFine-grainedloadbalancing:MigratetabletsawayfromoverloadedmachineMastermakesload-balancingdecisions2024/9/2095TabletLocationSincetabletsmovearoundfromservertoserver,givenarow,howdoclientsfindtherightmachine?Needtofindtabletwhoserowrangecoversthetargetrow2024/9/2096TabletAssignmentEachtabletisassignedtoonetabletserveratatime.Masterserverkeepstrackofthesetoflivetabletserversandcurrentassignmentsoftabletstoservers.Alsokeepstrackofunassignedtablets.Whenatabletisunassigned,masterassignsthetablettoantabletserverwithsufficientroom.2024/9/2097APIMetadataoperationsCreate/deletetables,columnfamilies,changemetadataWrites(atomic)Set():writecellsinarowDeleteCells():deletecellsinarowDeleteRow():deleteallcellsinarowReadsScanner:readarbitrarycellsinabigtableEachrowreadisatomicCanrestrictreturnedrowstoaparticularrangeCanaskforjustdatafrom1row,allrows,etc.Canaskforallcolumns,justcertaincolumnfamilies,orspecificcolumns2024/9/2098Refinements:CompressionManyopportunitiesforcompressionSimilarvaluesinthesamerow/columnatdifferenttimestampsSimilarvaluesindifferentcolumnsSimilarvaluesacrossadjacentrowsTwo-passcustomcompressionsschemeFirstpass:compresslongcommonstringsacrossalargewindowSecondpass:lookforrepetitionsinsmallwindowSpeedemphasized,butgoodspacereduction(10-to-1)2024/9/2099Refinements:BloomFiltersReadoperationhastoreadfromdiskwhendesiredSSTableisn’tinmemoryReducenumberofaccessesbyspecifyingaBloomfilter.AllowsusaskifanSSTablemightcontaindataforaspecifiedrow/columnpair.SmallamountofmemoryforBloomfiltersdrasticallyreducesthenumberofdiskseeksforreadoperationsUseimpliesthatmostlookupsfornon-existentrowsorcolumnsdonotneedtotouchdisk2024/9/20100Refinements:BloomFiltersReadoperationhastoreadfromdiskwhendesiredSSTableisn’tinmemoryReducenumberofaccessesbyspecifyingaBloomfilter.AllowsusaskifanSSTablemightcontaindataforaspecifiedrow/columnpair.SmallamountofmemoryforBloomfiltersdrasticallyreducesthenumberofdiskseeksforreadoperationsUseimpliesthatmostlookupsfornon-existentrowsorcolumnsdonotneedtotouchdisk2024/9/20101主要內(nèi)容102

云計(jì)算概述

Google云計(jì)算技術(shù):GFS,Bigtable和MapreduceYahoo云計(jì)算技術(shù)和Hadoop云數(shù)據(jù)管理的挑戰(zhàn)2024/9/20102Yahoo!Cloudcomputing2024/9/20103babycenterepicuriousSearchResultsoftheFutureLinkedInwebmdGawkerNewYorkTimes2024/9/20104What’sintheHorizontalCloud?CommonApproachestoQA,ProductionEngineering,PerformanceEngineering,DatacenterManagement,andOptimizationID&AccountManagementMonitoring&QoSSharedInfrastructureMetering,Billing,AccountingHorizontalCloudServicesEdgeContentServicese.g.,YCS,YCPIProvisioning&Virtualizatione.g.,EC2BatchStorage&Processinge.g.,Hadoop&PigOperationalStoragee.g.,S3,MObStor,SherpaOtherServicesMessaging,Workflow,virtualDBs&WebservingSecuritySimpleWebServiceAPI’s2024/9/20105Yahoo!CloudStackProvisioning(Self-serve)HorizontalCloudServices…YCSYCPIBrooklynEDGEMonitoring/Metering/SecurityHorizontalCloudServices…HadoopBATCHHorizontalCloudServices…SherpaMOBStorSTORAGEHorizontalCloudServicesVM/OS…APPHorizontalCloudServicesVM/OSyApacheWEBDataHighwayServingGridPHPAppEngine2024/9/20106WebDataManagementLargedataanalysis(Hadoop)Structuredrecordstorage(PNUTS/Sherpa)Blobstorage(SAN/NAS)ScanorientedworkloadsFocusonsequentialdiskI/O$percpucycleCRUDPointlookupsandshortscansIndexorganizedtableandrandomI/Os$perlatencyObjectretrievalandstreamingScalablefilestorage$perGB2024/9/20107TheWorldHasChangedWebservingapplicationsneed:Scalability!PreferablyelasticFlexibleschemasGeographicdistributionHighavailabilityReliablestorageWebservingapplicationscandowithout:ComplicatedqueriesStrongtransactions2024/9/20108PNUTS/SHERPAToHelpYouScaleYourMountainsofData2024/9/20109Yahoo!ServingStorageProblemSmallrecords–100KBorlessStructuredrecords–lotsoffields,evolvingExtremedatascale-TensofTBExtremerequestscale-Tensofthousandsofrequests/secLowlatencyglobally-20+datacentersworldwideHighAvailability-outagescost$millionsVariableusagepatterns-asapplicationsanduserschange

1102024/9/20110ThePNUTS/SherpaSolutionThenextgenerationglobal-scalerecordstoreRecord-orientation:Routing,datastorageoptimizedforlow-latencyrecordaccessScaleout:Addmachinestoscalethroughput(whilekeepinglatencylow)Asynchrony:Pub-subreplicationtofar-flungdatacenterstomaskpropagationdelayConsistencymodel:ReducecomplexityofasynchronyfortheapplicationprogrammerClouddeploymentmodel:Hosted,managedservicetoreduceapptime-to-marketandenableondemandscaleandelasticity1112024/9/20111E75656CA42342EB42521WC66354WD12352EF15677EWhatisPNUTS/Sherpa?E75656CA42342EB42521WC66354WD12352EF15677ECREATETABLEParts( IDVARCHAR, StockNumberINT, StatusVARCHAR …)ParalleldatabaseGeographicreplicationStructured,flexibleschemaHosted,managedinfrastructureA42342EB42521WC66354WD12352EE75656CF15677E1122024/9/20112WhatWillItBecome?E75656CA42342EB42521WC66354WD12352EF15677EE75656CA42342EB42521WC66354WD12352EF15677EE75656CA42342EB42521WC66354WD12352EF15677ECREATETABLEParts( IDVARCHAR, StockNumberINT, StatusVARCHA

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論