應(yīng)用多元統(tǒng)計(jì)分析1_第1頁(yè)
應(yīng)用多元統(tǒng)計(jì)分析1_第2頁(yè)
應(yīng)用多元統(tǒng)計(jì)分析1_第3頁(yè)
應(yīng)用多元統(tǒng)計(jì)分析1_第4頁(yè)
應(yīng)用多元統(tǒng)計(jì)分析1_第5頁(yè)
已閱讀5頁(yè),還剩33頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Prefacetothe1stEditionMostoftheobservablephenomena[fi'n?min?]intheempirical([em'pirik?l])sciencesareofamultivariatenature.Infinancialstudies,assetsinstockmarketsareobservedsimultaneouslyandtheirjointdevelopmentisanalyzedtobetterunderstandgeneraltendencies(趨勢(shì))andtotrackindices(路燈).Theunderlyingtheoreticalstructureoftheseandmanyotherquantitativestudiesofappliedsciencesismultivariate.ThisbookonAppliedMultivariateStatisticalAnalysispresentsthetoolsandconceptsofmultivariatedataanalysiswithastrongfocusonapplications.Theaimofthebookistopresentmultivariatedataanalysisinawaythatisunderstandablefornon-mathematiciansandpractitionerswhoare(面對(duì))bystatisticaldataanalysis.Thisisachievedbyfocusingonthepracticalrelevanceandthroughthee-bookcharacterofthistext.Allpracticalexamplesmayberecalculatedandmodifiedbythereaderusingastandardwebbrowserandwithoutreferenceorapplicationofanyspecificsoftware.1Mostoftheobservablephenomena[fi'n?min?]intheempirical([em'pirik?l]經(jīng)驗(yàn))sciencesareofamultivariatenature.Theunderlyingtheoreticalstructureoftheseandmanyotherquantitativestudiesofappliedsciencesismultivariate.ThisbookonAppliedMultivariateStatisticalAnalysispresentsthetoolsandconceptsofmultivariate[,m?lti've?reit]dataanalysiswithastrongfocusonapplications.2Thebookisdividedintothreemainparts.Thefirstpartisdevotedtographicaltechniquesdescribingthedistributionsofthevariablesinvolved.Thesecondpartdealswithmultivariaterandomvariablesandpresentsfromatheoreticalpointofviewdistributions,estimatorsandtestsforvariouspracticalsituations.

Thelastpartisonmultivariatetechniquesandintroducesthereadertothewideselectionoftoolsavailableformultivariatedataanalysis.Alldatasetsaregivenintheappendixandaredownloadablefrom.Thetextcontainsawidevarietyofexercisesthesolutionsofwhicharegiveninaseparatetextbook.Inadditionafullsetoftransparenciesonisprovidedmakingiteasierforaninstructortopresentthematerialsinthisbook.Alltransparenciescontainhyperlinkstothestatisticalwebservicesothatstudentsandinstructorsalikemayrecomputeallexamplesviaastandardwebbrowser.31-2weekUNIT-IDescriptiveTechniques(描述技術(shù))1Comparison(對(duì)照)

ofBatches1.1Boxplots41.2Histograms101.3Scatterplots171.4DataSet-BostonHousing3541ComparisonofBatchesMultivariatestatisticalanalysisisconcernedwithanalyzingandunderstandingdatainhighdimensions.Wesupposethatwearegivenaset{xi}ni=1ofnobservationsofavariablevectorXinRp.Thatis,wesupposethateachobservationxihaspdimensions:xi=(xi1,xi2,...,xip),andthatitisanobservedvalueofavariablevectorX∈Rp.Therefore,Xiscomposedofprandomvariables:X=(X1,X2,...,Xp)whereXj,forj=1,...,p,isaone-dimensionalrandomvariable.51ComparisonofBatchesMultivariatestatisticalanalysisisconcernedwithanalyzingandunderstandingdatainhighdimensions.

Howdowebegintoanalyzethiskindofdata?Beforeweinvestigatequestionsonwhatinferenceswecanreachfromthedata,weshouldthinkabouthowtolookatthedata.Thisinvolvesdescriptivetechniques.Questionsthatwecouldanswerbydescriptivetechniquesare:AretherecomponentsofXthataremorespreadoutthanothers?AretheresomeelementsofXthatindicatesubgroupsofthedata?ArethereoutliersinthecomponentsofX?How“normal”isthedistributionofthedata?671.1Boxplots1ComparisonofBatches8910Genuine['d?enjuin]真正的11X612X113Themedianandmeanbarsaremeasuresoflocations.Therelativelocationofthemedian(andthemean)intheboxisameasureofskewness.Thelengthoftheboxandwhiskersareameasureofspread.Thelengthofthewhiskersindicatethetaillengthofthedistribution.Theoutlyingpointsareindicatedwitha“★”or“●”dependingoniftheyareoutsideofFUL±1.5dForFUL±3dF

respectively.Theboxplotsdonotindicatemultimodalityorclusters.Ifwecomparetherelativesizeandlocationoftheboxes,wearecomparingdistributions.Summary14Readingmaterial21.datacapacity數(shù)據(jù)容量[k?'p?siti]22.datahandling數(shù)據(jù)處理['h?ndli?]23.datareduction數(shù)據(jù)縮減分析[ri'd?k??n]24.datatransformation數(shù)據(jù)變換25.densityfunction密度函數(shù)26.description描述27.descriptive描述性的28.deviationfromaverage均值離差[,di:vi'ei??n]背離29.Df.Fit擬合差值30.df.(degreeoffreedom)自由度31.distributionshape分布形狀[?eip]32.doublelogarithmic雙對(duì)數(shù)[,l?:g?'riemik]33.eigenvector特征向量['aig?n,vekt?(r)]34.errorofestimate估計(jì)誤差['estimeit]35.estimation估計(jì)量[esti'mei??n]重音差別36.Euclideandistance歐式距離[ju:'klidi?n]37.expectedvalue期望值[iks'pektid]38.experimentalsampling實(shí)驗(yàn)抽樣[ik,speri'ment?l]['sɑ:mpli?]39.explanatoryvariable說(shuō)明變量[ik'spl?n?t?ri]['v??ri?bl]40.exploreSummarize探索—摘要[ik'spl?:]['s?m?raiz]151.2Histogramsh=0.4Diagonal16Histogramsaredensity([‘dens?t?])(密度)estimates(['estimeits]概算).Adensityestimategivesagoodimpressionofthedistributionofthedata.Incontrasttoboxplots,densityestimatesshowpossiblemultimodality(多模式;綜合[,m?ltim?'d?liti])ofthedata.Theideaistolocallyrepresentthedatadensitybycountingthenumberofobservationsinasequenceofconsecutive(連續(xù)的)intervals(bins)(箱)withorigin([‘?r???n]起源、原點(diǎn))x0.LetBj(x0,h)denote([di'n?ut],指示,表示)thebinoflengthhwhichistheelementofabingridstartingatx0:

Bj(x0,h)=[x0+(j?1)h,x0+jh),j∈Z,where[.,.)(squarebrackets)denotesaleftclosedandrightopeninterval([‘?nt?rv?l]間隔

,右開(kāi)區(qū)間).17If{xi}n

i=1isani.i.d.samplewithdensityf,thehistogramisdefinedasfollows:Insum(1.7)thefirstindicatorfunctionI{xi∈Bj(x0,h)}countsthenumberofobservationsfallingintobinBj(x0,h).ThesecondindicatorfunctionIisresponsiblefor“l(fā)ocalizing”([l?uk?‘lizi?]局限)thecountsaroundx.Theparameterhisasmoothingorlocalizingparameterandcontrolsthewidth([widθ])ofthehistogrambins.Anhthatistoolargeleadstoverybigblocksandthustoaveryunstructuredhistogram.Ontheotherhand,anhthatistoosmallgivesaveryvariableestimatewithmanyunimportantpeaks.18H=0.1H=0.2H=0.3Diagonal[dai'?g?nl]adj.對(duì)角線的,斜的

n.對(duì)角線,斜線H=0.419TheeffectofhisgivenindetailinFigure1.6.Itcontainsthehistogram(upperleft)forthediagonalofthecounterfeitbanknotesforx0=137.8(theminimumoftheseobservations)andh=0.1.Increasinghtoh=0.2andusingthesameorigin,x0=137.8,resultsinthehistogramshowninthelowerleftofthefigure.Thisdensityhistogramissomewhatsmootherduetothelargerh.Thebinwidthisnextsettoh=0.3(upperright).Fromthishistogram,onehastheimpressionthatthedistributionofthediagonalisbimodalwithpeaksatabout138.5and139.9.Thedetectionofmodesrequiresafinetuningofthebinwidth.Usingmethodsfromsmoothingmethodology([meθ?‘d?l?d?i],n.方法學(xué))onecanfindan“optimal”binwidthhfornobservations:20counterfeit['kaunt?fit]adj.假冒的,假裝的21InFigure1.7,weshowhistogramswithx0=137.65(upperleft),x0=137.75(lowerleft),withx0=137.85(upperright),andx0=137.95(lowerright).Allthegraphshavebeenscaledequallyonthey-axistoallowcomparison.Oneseesthat—despitethefixedbinwidthh—theinterpretationisnotfacilitated([f?'siliteitid]vt.使容易).Theshiftoftheoriginx0(to4differentlocations)created4differenthistograms.Thispropertyofhistogramsstronglycontradictsthegoalofpresentingdatafeatures.22Modesofthedensityaredetectedwithahistogram.Modescorrespondtostrongpeaksinthehistogram.Histogramswiththesamehneednotbeidentical.Theyalsodependontheoriginx0ofthegrid.Theinfluenceoftheoriginx0isdrastic.Changingx0createsdifferentlookinghistograms.Theconsequenceofanhthatistoolargeisanunstructuredhistogramthatistooflat.Abinwidthhthatistoosmallresultsinanunstablehistogram.Thereisan“optimal”h=(24/n)1/3.Itisrecommendedtouseaveragedhistograms.Theyarekerneldensities.Summary231.4ScatterplotsScatterplotsarebivariateortrivariateplotsofvariables(['v??ri?bl])againsteachother.Theyhelpusunderstandrelationshipsamongthevariablesofadataset.Adownward-sloping([sl?upi?])scatterindicatesthatasweincreasethevariableonthehorizontalaxis,thevariableontheverticalaxisdecreases([‘di:kri:s]vt.減少).Ananalogous([?'n?l?g?s]adj.類似的)statementcanbemadeforupward-slopingscatters.

2425Figure1.12plotsthe5thcolumn(upperinnerframe)ofthebankdataagainstthe6thcolumn(diagonal).Thescatterisdownward-sloping.Aswealreadyknowfromtheprevioussectiononmarginalcomparisonagoodseparationbetweengenuineandcounterfeitbanknotesisvisibleforthediagonalvariable.Thesub-cloudintheupperhalf(circles)ofFigure1.12correspondstothetruebanknotes.Asnotedbefore,thisseparationisnotdistinct(adj.清楚的、明顯),sincethetwogroupsoverlap([,?uv?'l?p]vt.重疊)somewhat.2627Draftman繪圖員28Scatterplotsintwoandthreedimensionshelpsinidentifyingseparatedpoints,outliersorsub-clusters.Scatterplotshelpusinjudgingpositiveornegativedependencies.Draftmanscatterplotmatriceshelpdetectstructuresconditionedonvaluesofothervariables.Asthebrushofascatterplotmatrixmovesthroughapointcloud,wecanstudyconditionaldependence.Summary291.8DataSetBostonHousingDataSet3031Variable['v??ri?bl]adj.可變的,易變的,不定的n.變量,可變物3233FirstStep:NewWords第一類高頻詞160個(gè)1.absolutedeviation絕對(duì)離差['?bs?lu:t][,di:vi'ei??n]2.absoluteresiduals絕對(duì)殘差['rezidju:l]3.amonggroups組間[gru:p]4.analysisofcorrelation相關(guān)分析[?'n?l?sis][,k?r?'lei??n]5.analysisofcovariance協(xié)方差分析[k?u'v??ri?ns]6.analysisofregression回歸分析[ri'gre??n]7.BayesianestimationBeyes估計(jì)[b'eis][esti'mei??n]8.bivariate雙變量的[bai'v?riit]9.bivariateCorrelate二變量相關(guān)10.boxplot箱線圖3411.canonicalcorrelation典型相關(guān)[k?'n?nik?l]12.categoricalvariable分類變量[,k?ti'g?rikl]['v??ri?bl]13.centraltendency集中趨勢(shì)['sentr?l]['tend?nsi]14.chancestatistics隨機(jī)統(tǒng)計(jì)量[t??ns;t?ɑ:ns][st?'tistiks]15.chancevariable隨機(jī)變量16.classifiedvariable分類變量['kl?sifaid]17.coefficientofskewness偏度系數(shù)[k?ui'fi??nt]['skju:nes]18.confidencelimit置信限['k?nfid?ns]['limit]19.cumulativeprobability累計(jì)概率['kju:mjul?tiv][,pr?b?'biliti]20.curvature曲率['k?:v?t??]

3521.datacapacity數(shù)據(jù)容量22.datahandling數(shù)據(jù)處理23.datareduction數(shù)據(jù)縮減分析24.datatransformation數(shù)據(jù)變換25.densityfunction密度函數(shù)26.description描述27.descriptive描述性的28.deviationfromaverage離均差29.Df.Fit擬合差值30.df.(degreeoffreedom)自由度31.distributionshape分布形狀32.doublelogarithmic雙對(duì)數(shù)33.eigenvector特征向量34.errorofes

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論