版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
工大學(xué)《高級人工智能》演講報告書題目:Machinelearning:Trends,perspectives,andprospects(Unsupervisedlearningand
featurereduction)學(xué)院計算機科學(xué)與工程專業(yè)計算機科學(xué)與技術(shù)(全英創(chuàng)新班)學(xué)生姓名黃煒杰學(xué)生學(xué)號201230590051指導(dǎo)教師陳瓊起始日期2015年11月1日選題概述【小組論文選題】Machinelearning:Trends,perspectives,andprospects.【個人完成部分】Unsupervisedlearningandfeaturereduction論文學(xué)習(xí)報告【論文結(jié)構(gòu)】通過對論文的三次細讀,本人對論文內(nèi)容結(jié)構(gòu)理解如下:機器學(xué)習(xí)的定義與發(fā)展現(xiàn)狀機器學(xué)習(xí)核心算法2.1有監(jiān)督學(xué)習(xí)2.1.1有監(jiān)督學(xué)習(xí)綜述2.1.2深度學(xué)習(xí)綜述2.2無監(jiān)督學(xué)習(xí)2.2.1無監(jiān)督想學(xué)習(xí)綜述2.2.2特征降維方法2.3強化學(xué)習(xí)機器學(xué)習(xí)中亟待解決問題3.1數(shù)據(jù)隱私性3.2數(shù)據(jù)分布性機器學(xué)習(xí)發(fā)展機遇與挑戰(zhàn)【內(nèi)容總結(jié)】根據(jù)本人的理解,對文章的所有內(nèi)容進行了如下總結(jié):機器學(xué)習(xí)的定義與發(fā)展現(xiàn)狀:機器學(xué)習(xí)主要集中在研究2個問題:機器能否通過經(jīng)驗提升性能?是什么樣的統(tǒng)計理論原理支持著所有的學(xué)習(xí)系統(tǒng)?機器學(xué)習(xí)在這幾十年取得了長足的發(fā)展。之所以機器學(xué)習(xí)取得那么大的進步,是通過經(jīng)驗學(xué)習(xí)的機器,比純手工打代碼高效得多。學(xué)習(xí)問題可定義為對經(jīng)驗的研究來提高某個度量函數(shù),這個度量函數(shù)可以是有監(jiān)督學(xué)習(xí)里的準(zhǔn)確率等。機器學(xué)習(xí)中有很多算法,總的來說可以看作為對待選解集的一個搜索,這個搜索是通過對經(jīng)驗的學(xué)習(xí)來對性能進行優(yōu)化的過程,根據(jù)待選解的表示、搜索的方法,機器學(xué)習(xí)算法可以分為很多類。機器學(xué)習(xí)算法背后依靠的原理就是統(tǒng)計學(xué)中的最優(yōu)化原理,機器學(xué)習(xí)是計算機科學(xué)與統(tǒng)計學(xué)的交叉學(xué)科,這個交叉學(xué)科仍處在發(fā)芽階段。最近由于大數(shù)據(jù)的興起,算法可伸縮性、數(shù)據(jù)隱私性等問題對機器學(xué)習(xí)算法提出了新的挑戰(zhàn)。其中算法可伸縮性還需要解決的是一個數(shù)據(jù)粒度性問題。機器學(xué)習(xí)核心算法:2?1有監(jiān)督學(xué)習(xí):有監(jiān)督學(xué)習(xí)的應(yīng)用很廣泛,有如垃圾郵件過濾、人臉識別等。有監(jiān)督學(xué)習(xí)是度量函數(shù)最優(yōu)化的一個典型例子,學(xué)習(xí)的任務(wù)就是學(xué)習(xí)一個映射函數(shù)巳把任何輸入樣本x*映射到類別y*上。其中發(fā)展最為成功的是二分類問題,在多分類、ranking、結(jié)構(gòu)預(yù)測問題上也有豐富的研究成果。其中卓越的算法有如決策樹、邏輯回歸、支持向量機、神經(jīng)網(wǎng)絡(luò)等。一些通用性比較強的學(xué)習(xí)算法受到了學(xué)者們的認(rèn)可,其中主要以集成學(xué)習(xí)為主。集成學(xué)習(xí)集成了多個分類器的學(xué)習(xí)結(jié)果,對數(shù)據(jù)的判別具有普遍性。因為現(xiàn)實問題的提出,對有監(jiān)督學(xué)習(xí)算法提出了各種各樣的要求,根據(jù)算法復(fù)雜度和性能的折衷,已有很多成功的算法。2.1?1深度學(xué)習(xí):深度學(xué)習(xí)是近幾年最熱門的課題之一,是對人工神經(jīng)網(wǎng)絡(luò)的提升。深度學(xué)習(xí)網(wǎng)絡(luò)由多層的threshold單元組成,利用梯度原理對網(wǎng)絡(luò)的權(quán)值進行優(yōu)化調(diào)整,使得誤差最小化。因為算法的卓越性,它可以處理幾百萬個參數(shù)、大規(guī)模的數(shù)據(jù)集,正因為具有如此良好的伸縮性,這個算法已成功應(yīng)用在計算機視覺、自然語言處理等領(lǐng)域。如今深度學(xué)習(xí)的領(lǐng)域仍然處于膨脹階段,越來越多的學(xué)者投入到此領(lǐng)域中,發(fā)展前景無可估量。2.2無監(jiān)督學(xué)習(xí):無監(jiān)督學(xué)習(xí)是對無標(biāo)簽數(shù)據(jù)學(xué)習(xí)的一個過程。其中最具有代表性的是聚類學(xué)習(xí),聚類的任務(wù)是把數(shù)據(jù)組成數(shù)據(jù)簇,使得簇內(nèi)數(shù)據(jù)相似度最大,簇間數(shù)據(jù)相似度最小。特定的聚類算法都是對數(shù)據(jù)具有一定的分布假設(shè),通過分布的假設(shè)發(fā)現(xiàn)數(shù)據(jù)的內(nèi)在結(jié)構(gòu)。與聚類相似,特征降維也是基于對數(shù)據(jù)分布的假設(shè)進行的,部分特征降維算法可以視為無監(jiān)督學(xué)習(xí)。特征降維在大數(shù)據(jù)處理上的地位尤為重要,著名的特征降維方法有如PCA、LDA等。2.3強化學(xué)習(xí):所謂強化學(xué)習(xí)就是智能系統(tǒng)從環(huán)境到行為映射的學(xué)習(xí),以使獎勵信號(強化信號)函數(shù)值最大,強化學(xué)習(xí)不同于連接主義學(xué)習(xí)中的監(jiān)督學(xué)習(xí),主要表現(xiàn)在教師信號上,強化學(xué)習(xí)中由環(huán)境提供的強化信號是對產(chǎn)生動作的好壞作一種評價(通常為標(biāo)量信號),而不是告訴強化學(xué)習(xí)系統(tǒng)RLS(reinforcementlearningsystem)如何去產(chǎn)生正確的動作。由于外部環(huán)境提供的信息很少,RLS必須靠自身的經(jīng)歷進行學(xué)習(xí)。通過這種方式,RLS在行動-評價的環(huán)境中獲得知識,改進行動方案以適應(yīng)環(huán)境。2.4三種學(xué)習(xí)算法的結(jié)合:三種學(xué)習(xí)算法都取得了一定的成功,近期的研究很多集中在三種算法的混合當(dāng)中,其中的例子有如半監(jiān)督學(xué)習(xí),在聚類過程中,部分樣本的類屬關(guān)系可以得知,而這個有監(jiān)督的類屬關(guān)系作為約束提供在無監(jiān)督聚類當(dāng)中。另外的例子有如模型選擇、主動學(xué)習(xí)、causal學(xué)習(xí)等。很多的問題影響到學(xué)習(xí)算法的設(shè)計,其中包括串行與并行性、魯棒性等問題。機器學(xué)習(xí)亟待解決問題:數(shù)據(jù)的隱私性與分布性是最迫切的兩個問題。其中數(shù)據(jù)的隱私性涉及了政策原因、數(shù)據(jù)擁有權(quán)、數(shù)據(jù)隱私度等問題。有的數(shù)據(jù)人們是愿意分享出來以供大家研究的,而有的數(shù)據(jù)則希望絕對保密,對于同一數(shù)據(jù),不同人也有不同的想法。如何把握數(shù)據(jù)隱私的度量是重要的道德性問題。由于數(shù)據(jù)的大規(guī)模以及分布性,把數(shù)據(jù)集中在一起處理通常不可能,有如全國連鎖的分公司在各地都擁有巨大數(shù)據(jù)庫,只能通過分布式學(xué)習(xí)然后結(jié)合學(xué)習(xí)結(jié)果?,F(xiàn)代的算法應(yīng)該越來越注重并行性,以使得它能在實際環(huán)境中很好的運作。機器學(xué)習(xí)機遇與挑戰(zhàn):現(xiàn)代機器學(xué)習(xí)雖然取得了很大的發(fā)展,可以離真正的智能還離很遠。永不止境的自適應(yīng)學(xué)習(xí)機是機器學(xué)習(xí)的最高愿望。作為下一步,半人工合作式學(xué)習(xí)算法是一個新課題,通過人工的介入,使得機器能分析處理各種各種的復(fù)雜數(shù)據(jù)。
拓展學(xué)習(xí)這篇論文是一篇綜述性的文章,沒有公式、算法、實驗、證明,也沒提及無監(jiān)督學(xué)習(xí)上的前沿方向,頁數(shù)也是最少的。所以為了更好地進行理論學(xué)習(xí),本人另外閱讀了2篇論文作為拓展學(xué)習(xí):PengH,LongF,DingC(2005)Featureselectionbasedonmutualinformation:criteriaofmax-dependency,max-relevance,andmin-redundancy.IEEETransPatternAnalMachIntell27:1226-238.X.Z.FernandC.E.Brodley.Randomprojectionforhighdimensionaldataclustering:Aclusterensembleapproach.InMachineLearning,ProceedingsoftheInternationalConferenceon,2003.其中第一篇講的是特征選擇,第二篇講的是隨機映射(特征降維方法)以及無監(jiān)督學(xué)習(xí)中的前沿方向----聚類集成。由于本人一開始做的是英文的PPT和學(xué)習(xí)報告,后來和組員PPT合并時才協(xié)商一起用中文,所以我把PPT和上面的內(nèi)容翻譯成了中文。下面拓展學(xué)習(xí)部分我寫了一個晚上,不想再花一個晚上翻譯成中文(。。),所以保持使用英文,望見諒。ountFeatureselectionountFeatureselectionwasproposedin1970andtherehasbeengreatamofworkstudiedonit.Objectiveoffeaturereductionisthree-fold:Improvingtheaccuracyofclassification,provideafasterandmorecost-effectivepredictors,andprovideabetterunderstandingoftheunderlyingprocessthatgeneratedthedata.Featureselectionisdifferentfromfeatureextraction,whichcanbeseeninthefigurebelow:Targetoffeatureselectionistoselectasubsetoffeaturesinsteadofmappingthemintolowdimension.Givenasetoffeatures:,FeatureSelectionproblemisdefinedasfindingasubsetthatmaximizesthelearner'sabilitytoclassifypatterns.Moreformally,F'shouldmaximizesomescoringfunctions廠where「isthespaceofallpossiblefeaturesubsetsofF:F'=argmax~「加(b}G^tFrameworkoffeatureselectionisgivenasfollow:Wheretwomainpartofitisgenerationstepandevaluationstep.Forgenerationstep,themaintaskisselectcandidatesubsetoffeatureforevaluation.Thereare3waysinhowthefeaturespaceisexamined:(1)Complete⑵Heuristic(3)Random.Complete/exhaustive:Examineallcombinationsofpossiblefeaturesubsetwhichcontainelements,forexamplewecanexamfeature(f1,f2,f3}inthisway:(f1,f2,f3}=>((f1},(f2},(f3},(f1,f2},(f1,f3},(f2,f3},(f1,f2,f3}}.Optimalsubsetisachievableifwesearchallthepossiblesolution,butit'stooexpensiveiffeaturespaceisverylarge.⑵HeuristicSelectionisdirectedundercertainguideline.Startwithemptyfeatureset(orfullset),select(ordelete)onefeatureineachstepuntilthetargetnumberoffeaturesisachieved.Forexampletheincrementalgenerationofsubsets:(f1}一(f1,f3}一{f1,f3,f2}.RandomNopredefinedwaytoselectfeaturecandidate,pickfeatureatrandom.Requiremoreuser-definedinputparameterslikethetimeoftry.Accordingtowhetherthelearningalgorithmisparticipateintheselectionstep,featureselectionmethodcanbedividedintothreecategory:filter,wrapper,andembedded,whichisgivenasfollow:Wrapperapproach:EmbeddedapproachFilterApproachisusuallyfast.Itprovidegenericselectionoffeatures,nottunedbygivenlearningalgorithm.ButitstpediftGOstatisticalmethod,notoptimizedforusedclassifier,sosometimesfiltermethodsareusedasapre-processingstepforothermethods.Forwrapperapproach,learnerisconsideredablack-box,usedtoscoresubsetsaccordingtotheirpredictivepower.Theaccuracyisusuallyhigh,butresultvaryfordifferentlearners,lossgeneralization.Oneneedstodefine:howtosearchthespaceofallpossiblevariablesubsets(possibleselections)andhowtoassessthepredictionperformanceofacertainsubset.FindingoptimalsubsetisNP-hard!Awiderangeofheuristicsearchstrategiescanbeused:IDPT,Branch-and-boundmethod,simulatedannealing,TABUsearchalgorithm,geneticalgorithm,forwardselection(startwithemptyfeaturesetandaddfeaturesateachstep)andbackwarddeletion(startwithfullfeaturesetanddeleteonefeatureateachstep).Predictivepowerisusuallymeasuredonavalidationsetorbycross-validation.Drawbackofwrappermethodisthatalargeamountofcomputationisrequired,hasdangerofoverfitting.Embeddedapproachisspecifictoagivenlearningmachine.Itcombinetheadvantagesofbothpreviousmethods:reducetheclassificationoflearning,takesadvantageofitsownvariableselectionalgorithmandusuallyimplementedbyatwo-stepormulti-stepprocess.Forevaluationstep,themaintaskisusuallyimplementedbyatwo-stepormulti-stepprocess.5maintypeofevaluationfunctionsare:distance(Euclideandistancemeasure),information(entropy,informationgain,etc.),dependency(correlationcoefficient),consistency(min-featuresbias),andclassificationerrorrate.Wherethefirstfourmethodareusedforfiltermethodandthelastoneisforwrapper.Anapplicationoffeatureselectioninsupervisedlearningisgiveninthefollowing,whichisextractedforthepaperFeatureselectionbasedonmutualinformation:criteriaofmax-dependency,max-relevance,andmin-redundancy'.Optimalcharacterizationconditionoffeatureselectioninsupervisedlearningisminimalclassificationerrorandmaximalstatisticaldependencyisformaximalstatisticaldependency.OneofthemostpopularapproachestorealizeMax-Dependencyismaximalrelevance,whichmeansthatoneofthemostpopularapproachestorealizeMax-Dependencyismaximalrelevance:ProblemsofthismethodisCombinationsofindividuallygoodfeaturesdonotnecessarilyleadtogoodclassificationperformance,i.e.“thembestfeaturesarenotthebestmfeatures".Andalsorelevancealonemayintroducerichredundancy.Sofeatureswithminimumredundancyshouldalsobeconsidered.Sotheauthorproposedanotheralgorithmthatsolvetheproblem.Mainworkofthepaperconsistofthreepart:(1)PresentatheoreticalanalysisshowingthatmRMR(max-relevanceandmin-redundancy)isequivalenttoMax-Dependencyforfirst-orderfeatureselection.(2)InvestigatehowtocombinemRMRwithotherfeatureselectionmethodsintoatwo-stageselectionalgorithm.(3)ComparemRMR,Max-Relevance,Max-Dependency,andthetwo-stagefeatureselectionalgorithmthroughcomprehensiveexperiments.Sincethefirstpartisunrelatedtothecourseproject,soIskippeditandonlyoneexperimentintheoriginalpaperwillbementioned.TheproposedalgorithmisnamedmRMR(Max-RelevanceandMin-Redundancy),wheremax-relevancemeansselectfeaturesthataremostrelevanttothetargetclass,i.e.selectfeaturessatisfying:,'宅eSI(x,y)ismutualinformationthatIhadmentionedbefore.AndMin-Redundancymeansthatselectfeaturesthatnotredundantwithselectedfeatures,whichsatisfying:minR(S).—t5Z心5)I占I殆*SThenaOperator):,isdefinetoachievethismulti-objectoptimizationtaskwhichcombineDandR,optimizeDandRsimultaneously:J?).=D—li.Inpractice,incrementalsearchmethodscanbeusedtofindthenear-optimalfeatures:平?峪貝叼;砰一七,(叼;工J一I..盅茂—LJUntilnowthisnotthewholeprocessofthealgorithm,it'sonlyahalfofit.ThealgorithminthispaperisAtwo-stageprocess:(1)FindacandidatefeaturesubsetusingmRMRincrementalselectionmethod.(2)Usemoresophisticatedmethod(classifierinvolved)tosearchacompactfeaturesubsetfromthecandidatesubset.Sothatthistwo-stagealgorithmisacaseofembeddedmethod.Thefirststageisgivenasfollow:UsemRMRincrementalselectiontoselectsequentialfeatures:SiUS2U…CSn_iCSnCompareclassificationerrorofallthesubsets,,findtherangeofk,called0,withinwhichtherespectiveerror門isconsistentlysmall.Within0,findsmallesterrore=min〔n,optimalsubsetsizeisthekcorrespondstoc.TheSecondstageisgivenasfollow:Forbackwardselection:Excludeoneredundantfeatureifresultanterrorissmallerthaneachtime(selecttheoneleadstogreatesterrorreduction).Terminateifnoerrorreductioncanbeobtained.Forforwardselection:Selectonefeaturewhichleadstogreatesterrorreductioneachtime.Terminateiferrorbeginstoincrease.Nowthealgorithmofthispaperiscomplete.Bestevaluatehoweffectiveandefficientthisalgorithmis,thereisalsoaproblemthathowtojudgewhichalgorithmissuperior.SotheauthordefineameasurementofRM-characteristic.GiventwofeaturesetLandS:,whichisgeneratesequentially:S::s]仁s!仁…ZS;匚…仁cS£疣虛暖.…?磴
Wesaythat5;isrecursivelymorecharacteristic(RM-characteristic)thans*byp%,ifforp%ofk,erToissmallerthanW.NB0.250.21Q20知WWfe&iLienumber--a-rri^MRMokRciHDRdatasetARRdatasetLDAMa缺102QNB0.250.21Q20知WWfe&iLienumber--a-rri^MRMokRciHDRdatasetARRdatasetLDAMa缺102Q3^40$0feeiyrtiwbermRMRMaxRfll0.15W20X4050fuafLureinunhern1-—01020SO4050featuenumberM51。W205?4Q5CfvaL*jrenijntrerFigureaboveisoneoftheresultofexperimentgiveninthepaper.Eachrowisforadifferentdatasetandeachcolumnisfordifferentclassificationalgorithm.Foreachgraph,X-axisdenotesthenumberofselectedfeatures,Y-axisisforerrorrate.Thelineonthetopwithtriangleonitistheproposedalgorithmandthebuttononeisthestate-of-artalgorithmonthattime.Asshownintheresult,classificationaccuracycanbesignificantlyimprovedbasedonmRMRfeatureselection.Thereisalsoanexperimentdonebymyselftoverifythatfeatureselectionmethodcanimproveaccuracy:
41015202&JO3543455Q55IMuienumtwjThisexperimentiscarriedonSpambasedatasetbySVMalgorithmwithlinearkernel.X-axisdenotesthenumberofselectedfeatures,Y-axisisforaccuracy.Redlineistheproposedalgorithm,othersarebaseline,traditional,random.Wecanseethattheproposedalgorithmperformsthebest.SoIamconvincedthatfeatureselectionmethodscanimprovedaccuracyoflearningalgorithm.RandomprojectionRandomprojectionisoneoffeatureextractionalgorithm.MostfamousfeatureextractionalgorithmincludesPCA,LDA,LLEetc.RandomprojectionismentionedasLSHmethodsometimesandit'shighlyunstable,soit'snotsofamous.Butit'squietusefulinsomecaseandmuchefficientthanthatofmostfamousalgorithmsuchasPCA.Mainstepsofrandomprojectioncanbeintroducedbriefly:(1)Selectasetofhigh-dimensionalunitvectors(notnecessaryorthogonal)randomly(2)ProjecthighdimensiondataintolowdimensionbyproductionofthesevectorsSuchstepssoundssimpleandsomewhatunreliable,butinfactthere'sLemmathatguaranteetheprecisionofit,whichiscalledJohnson-LindenstraussLemma.Mainideaofitis,ItispossibletoprojectnpointsinaspaceofarbitrarilyhighdimensionontoanO(logn)-dimensionalspace,suchthatthepairwisedistancesbetweenthepointsareapproximatelypreserved.Moreformally:Johiison-LindenTheoremForanyflcs<Iandanypositiveintegern,leiJtbeapositiveintegersuchth就k>4(孝/2—『/3)ThmThenforanyseiFcfmpointshithereisamapf:TR?'suchthatforallU.〃£W(1<||/(W)-/(U)|||2<(1+E)||也Furthermore,thismapcanbefoundinexpectedpolynomialtime.HereweusesampledistanceasameasureofgoodnessoffeaturereductionperformanceforthereasonthatoneoftheObjectiveoffeaturereductionisthatpairwisedistancesofthepointsareapproximatelythesameasbefore.Indataminingarea,weknowthatdatasethastwowayofrepresentation:DatamatrixandDiscernibilityMatrix:X1I-xlfxid0心)oxil-xif-xidW)dg)0xitl0DatamatrixDiscernibilityMatrixIfpairwisedistanceofdatapointsreserveprecisely,thentheDiscernibilityMatrixretainmostoftheinformationfortheoriginaldataset,andwesaythatthafsagoodfeaturereductionmethod.Thereareseveralwaysforrandomprojection「Rh.WeadopttheoneintheoriginalJohnson-Lindenstrausspaper:Theprojection|duetoJohnson-Lindenstrauss)LetAbearandomk%matrixthatprojectsR,'onicaunJoirnrandomk-dimensicnalsubspaceMultiplyJbyafixedscalar...Forevery『廠vismappedtoJU,[?,VkVkTomakeabetterunderstanding,Idrawagraphfortheprocess:Advantageofrandomprojectionisthatitdoesnotuseanydefined“interestingness”criterionlikePCAandHigh-dimensionaldistributionslookmorelikeGaussianwhenprojectedtolowdimensional.Butit'sahighlyunstablealgorithm,forexample:Theleftpictureisthetruedistributionofahighdimensionaldataset(use2ofitsfeaturestomakethegraph).Themiddleandrightistwosinglerunofclusteringalgorithmafterrandomprojection.Wecanfindthatresultofeachrunmakehavegreatdifference.Butit'sjustthisunstableperformanceprovideamultiviewofthesamedataset,whichisusefulinensemblelearning.Clusterensemble■1Ensemblelearningisahottopicintheseyears.Clusterensembleisoneofthenewesttopicofunsupervisedlearning.Frameworkofclassificationensembleisshownasfollow:Givenacertaindataset,wefirstgenerateadifferentviewofthedataset,whichcanbeimplementedbybootstrapsampling,randomfeaturesubspaceorothermethod.Thenweusedifferentlearningalgorithmorthesamealgorithmwithdifferentparameterorevenjustthesamealgorithmtogenerateservaldifferentclassifier.Whenanewdatacomesin,multiclassifiercanbeusedtoclassifyitandobtainthefinalclassificationresultbasedonvotingschemeorothermethod.Clusterensembleisalmostthesamewithclassificationensemble:Butthemaindifferenceisthatclusteringsolutionofeachrunmayhavedifferentoutputlabelanddifferentnumberofoutputclass,whichmakeitimpossibletoobtainafinalresultbyvotingschemedirectly.Soaconsensusfunctionneedstobedefinetocombinetheresultofmultiruns.ThereisanapplicationofrandomprojectionIhadmentionedbeforeandacasestudyforclusterensemble.Ifsgivenbythepaper?Randomprojectionforhighdimensionaldataclustering-Aclusterensembleapproach'publishedbyIMCLin2013.Firstofall,randomprojectionisusedtoprovidedifferentrepresentationoftheoriginaldataset.Inordertoshowthesuperiorperformanceofrandomprojection,PCAalgorithmisalsousedtomakeacomparisonwithrandomprojection.Firststepofthegenerationistogenerateannxnsimilaritymatrix:Foreachrun,EMgeneratesaprobabilisticmodel0ofamixturekGaussianProbabilityofpointibelongstoclusterlisdenotebyP(l|i,0)*Probabilityofpointiandpointjdenotetosameclusterisdenotedby:AndthenPijisaverageformultiplerun.Toverifytheusefulnessofthismetric,theauthorplotahistogramforwhethersampleiandsamplejbelongstosameclusterornot:l|Pwliand[:MilTiEZala?laWecanseethattheyhavedifferentandlittleoverlap,soit'sagoodmetricforsimilaritymatrix.Thenwecanusethefollowingalgorithmtoobtainthefinalclustering:Inputs:Pisaxrisimilaritymatrix^isadesiredJiumbfacofcliisterR.Output:apartitiono
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年度珠寶品牌授權(quán)與連鎖經(jīng)營合同范本2篇
- 二零二五版房地產(chǎn)項目市場調(diào)研與策劃咨詢服務(wù)合同范本3篇
- 二零二五年度農(nóng)副產(chǎn)品電商平臺數(shù)據(jù)分析與應(yīng)用合同
- 2025年度智能穿戴設(shè)備代生產(chǎn)加工合同范本4篇
- 2024政府機關(guān)信息化系統(tǒng)運維服務(wù)詢價采購合同3篇
- 個體餐飲店合伙人股權(quán)回購協(xié)議模板版B版
- 二零二五年度住宅樓屋頂綠化工程合同3篇
- 2025年度頁巖磚綠色建筑材料采購與供應(yīng)鏈管理合同3篇
- 2025年度臍橙產(chǎn)業(yè)鏈上下游聯(lián)合開發(fā)合作合同4篇
- 2025年專業(yè)廚房裝修設(shè)計與施工合同范本2篇
- 2024至2030年中國膨潤土行業(yè)投資戰(zhàn)略分析及發(fā)展前景研究報告
- 【地理】地圖的選擇和應(yīng)用(分層練) 2024-2025學(xué)年七年級地理上冊同步備課系列(人教版)
- (正式版)CB∕T 4552-2024 船舶行業(yè)企業(yè)安全生產(chǎn)文件編制和管理規(guī)定
- JBT 14588-2023 激光加工鏡頭 (正式版)
- 2024年四川省成都市樹德實驗中學(xué)物理八年級下冊期末質(zhì)量檢測試題含解析
- 九型人格與領(lǐng)導(dǎo)力講義
- 廉潔應(yīng)征承諾書
- 2023年四川省成都市中考物理試卷真題(含答案)
- 泵車述職報告
- 2024年山西文旅集團招聘筆試參考題庫含答案解析
- 恢復(fù)中華人民共和國國籍申請表
評論
0/150
提交評論