MVA-basics多元統(tǒng)計(jì)分析_第1頁(yè)
MVA-basics多元統(tǒng)計(jì)分析_第2頁(yè)
MVA-basics多元統(tǒng)計(jì)分析_第3頁(yè)
MVA-basics多元統(tǒng)計(jì)分析_第4頁(yè)
MVA-basics多元統(tǒng)計(jì)分析_第5頁(yè)
已閱讀5頁(yè),還剩28頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Multivariate

Statistical

AnalysisIf

we

obtain

analytical

data

on

two

groups

of

samples

which

wesuspect

may

be

different,

can

we

determine

the

followinginformation?¢

Are

the

groups

different?¢

Detect

those

compounds

which

have

increased

or

decreased

inconcentration

in

each

group.¢

Detect

those

compounds

which

are

missing

from

each

group

and

thosewhich

are

unique

to

each

group.¢

These

are

the

compounds

which

contribute

to

the

variance

betweenthe

groupsThis

is

an

imposing

PROBLEM

if

we

try

to

employ

traditionalmethods

of

spectral

comparison. Thus

we

need

to

employ

a

data-mining

technique

to

aid

us.SOMETHOUGHTSON“OMICS”

STUDIES1

2

51

50

1

75

20

0

22

52

5

02

75

3

00

32

5

35

03

7

54

00

4

25

45

0

47

55

0

05

25

5

50

57

5

60

06

2

56

50

6

75

70

0

72

57

5

07

75

8

00

82

5

85

08

7

59

00

9

25%1:TOFMSE1.93e55__DDaayy22__RRaatt__880405_Day2_Rat_81582(9.712)Cm(67:1617)100x21

2

51

50

1

75

20

0

22

52

5

02

75

3

00

32

5

35

03

7

54

00

4

25

45

0

47

55

0

05

25

5

50

57

5

60

06

2

56

50

6

75

70

0

72

57

5

07

75

8

00

82

5

85

08

7

59

00

9

25%55__DDaayy22__RRaatt__880405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1

0

0x

2A

Manual

ApproachThink

about

doing

a

rigorous

comparison

of

justthese

4

spectra

(~2,000

masses

or

metabolites

in

each).1

2

51

50

1

75

20

0

22

52

5

02

75

3

00

32

5

35

03

7

54

00

4

25

45

0

47

55

0

05

25

5

50

57

5

60

06

2

56

50

6

75

70

0

72

57

5

07

75

8

00

82

5

85

08

7

59

00

9

25%5_Day2_Rat_80405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1

0

0x

21

2

51

50

1

75

20

0

22

52

5

02

75

3

00

32

5

35

03

7

54

00

4

25

45

0

47

55

0

05

25

5

50

57

5

60

06

2

56

50

6

75

70

0

72

57

5

07

75

8

00

82

5

85

08

7

59

00

9

25%5_Day2_Rat_80405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1

0

0x

2?Itispossibletominethedatausingmultivariatestatistics.UsingthisapproachweanalyzethegroupsusingGCorLC/MSandtabulatealltheobservedmassesandtheirchromatographicretentiontimeswithadvancedcomputationalmethods.Thesemass/retentiontimepairsbecomethevariablesusedforstatisticalanalysis.?Multivariatestatisticsthenallowsustoreducethousandsvariables(mass/retentiontimepairs)downtoasimple2orthreedimensionalmapwhichshowsthatthegroupsaredifferentandprovidesuswithalistofthevariableswhichcontributetothedifference.A

SOLUTIONTO

THEPROBLEMWHAT

HAVE

WE

JUSTDONE?Latitude

35°

38‘

31.5“

NorthLongitude

139°

45‘

7.3“

EastAltitude12

MMULTIVARIATE

STATISTICALANALYSIS?

Spectrum

(observation)becomes

a

point

in

PCAScores

plot?

Variables(m/z_RT)shown

in

PCALoadings

PlotUsing

plots

together

allows

trends

in

the

sample

spectra

to

beinterpreted

in

terms

of

m/z632143_185.0493_213.043Why

not

:Hierarchical

ClusteringHeat

makeANOVAT

tests?

With

MarkerLynx

it

is

possible

to

export

yourdata

to

any

statistical

program

you

like.WHY

CHOOSE

TO

USE

MULTIVARIATE

STATISTICS?WHY

USE

MULTIVARIATE

ANALYSIS??

Short

and

wide

data

setsFew

observations

(N)Many

variables

(K)Noisy

dataMissing

data/excluded

regionsMultiple

objectives?

ImplicationsHigh

degree

of

correlation

(Many

variables

are

related)Difficult

to

analyse

with

conventional

methods?

Require

methods

for

simplification

and

visualisation8KNPrinciple

Component

Analysis(PCA)A

multivariate

statistical

approach

thatfacilitates

the

identification

of

differencesorsimilarities

between

groupsData

tablevariable

spaceThe

whole

table

yields

a

swarm

ofpoints

in

variable

spacevar.3var.3Singleobjectinvariablespacevar.2var.2var.1var.1DATAPREPARATIONmeanvar.1Centering–movecentreofpointswarmtothevariableoriginvar.2var.3PRE-PROCESSING(CENTERING)CENTRING

&SCALINGvar.2var.3var.1Scaling–puteachvariableonanequalfootinge.g.makestandarddeviationsequal(nottheonlyway)var.1var.2var.3STEPBYPCA

THEORY

–STEPvar.1(i)var.2var.3ti1The

first

principal

component

(PC

)1is

set

to

describe

the

largest

variation

in

the

data,PC1

(t1p’1)

which

is

thesameas

the

direction

in

which

thepoints

spread

most

in

the

variable

spaceThe

Score

value

(ti1)

for

the

point

i

is

the

distancefrom

the

projection

of

the

point

on

the

1:stcomponent

to

the

origin.PC1

hence

is

the

first

latent

variable

in

a

newcoordinate

system

that

describes

the

variationin

the

data.STEPBYPCA

THEORY

–STEPvar.1var.2var.3PC1(i)PC2ti1ti2The

second

principal

component

(PC2)is

set

to

describe

the

largest

variation

in

the

data,Perpendicular

(orthogonal)

to

the

1:st

componentA

corresponding

loading

plot

describes

thevariables

relationshipsallows

interpretation

of

the

scores

plotbyshowing

which

variables

are

responsible

forsimilarities

and

differences

between

samples.TheperpendiculardistancefromtheobjecttotheprojectionontheplaneistheresidualofthetwoPCs.TwoPCsmakeaplane(window)intheK-dimensionalvariablespace.Thepointsareprojecteddownontotheplanewhichisliftedoutandviewedas

a

two

dimensional

plot.PCA

theory

step

by

step·,

=data

points;=

projection1x,1x22This

is

the

scores

plot

similarities

or

differencesbetweensamplescan

now

be

seen.x33PC1PC2SCORESPLOT

EXAMPLEShockcor

et

al,

2001,

Magnetic

Resonance

in

Chemistry,

39:559-565.THE

LOADINGSPLOTSThe

loading

(p)

is

described

as

the

cosine

ofthe

angle

between

the

original

variable

and

thePC.PC

2Loading(p):describedthevariationinthevariabledirectioni.e.similarity/dissimilaritybetweenvariables,andalsoexplainsthevariationinscores.Theloading(p)describestheoriginalvariablesimportanceforrespectivePC.ThisisthesameasthesimilarityindirectionbetweentheoriginalvariableandthePC.PC1Projection

of(rxt

,

m/xz

)px,1px,2I(rt1,

m/z1)sample

i1,22,2PC2p

=

cospi

2

=

1I(rt2,

m/z2)With

px,1

=

cos(

x,1)

and

px,2

=

cos

(x,2)and

x,1

:

anglebetween

axe

(rtx,

m/zx)

and

PC1and

x,2

:

anglebetween

axe

(rtx,

m/zx)

and

PC2var.12=

90oPC11=

0oLoadingsIflargestvariationcoincideswithvar.1,thvar.2 firstprincipalcomponentwillbeindirectionvar.1PC1scores=valuesofvar.1PC1loadings,p1=(1,0)Credit:

Henrik

Antti

/

Umea

UniversityExample

Loadings

Plot20INTERPRETATION

OF

PCAScoresObservations

(spectra)Trends,

patterns,groupsLoadingsVariables

(m/z)Correlation,

influencePC2PC1STRONGOUTLIERSCredit:

Henrik

Antti

/

Umea

University-2-123020]21

[t010t[1]THICKNES.M1

(PC),

UnTitled,

Work

setScores:

t[1]/t[2]4066

261489

4611112315484118358146

7565915315510

7

86

17

61

5

2

0

075

75859816400311910111995315311428352613226014896884521938721148119217757051265135778846713017375

9

96012608152961938128282799154321627171712174381691266913718416406248115021245464174329279357

6

3

1

17

12212

9 6184

4

53 4

761641253912812

170641184910520519681403

7450160313747273033

6925118391365103101

1849123541484

118

0Ellipse:

Hotelling

T2

(0.05)Simca-P

7.0

by

Umetri

AB

1998-08-17

12:09-4-22]

2[0t-7

-6

-5

-4

-3

-2

-1

0

1

245

673t[1]41

(PC),

UnTitled,

Work

setScores:

t[1]/t[2]GermanyFranceBelgiumPortugaAlustriaSwedenDenmarkFinlandNorwayItaly

SpainIreSlawLnuixtedmzbeorul

HollandEnglandEllipse:

Hotelling

T2

(0.05)Simca-P

7.0

by

Umetri

AB

1998-08-17

09:23STRONGDETOUTLIERECTIONHOTELLINGS–T2Hotelling"s

T2

is

a

multivariate

generalisation

of

Student"s

tEllipse

of

constant

T2

confidence

regionStrongoutliersNo

stroFOOnDS.Mgoutlier

sHotelling’s

ellipseCredit:

Henrik

Antti

/

UmeaUniversity?Moderateoutliers detectedonresiduals(DmodX)plot?Distributionofresiduals~Normal?Ftestspecifiescut- offate.g.99%confidenceMODERATEOUTLIERSCredit:

Henrik

Antti

/

Umea

UniversityNo

moderate

outliers

Some

moderate

outliersCritical

distance

is

derived

from

the

distribution

of

residu0.200.401.001.200.000

1 2

3 4

5

67

8 9

10

11

12

13

14

15

16

17M

0.60

Do]3[

0.80XdFOODS.M1

(PC),

UnTitled,

Work

setDModX,

Comp

3(Cum)DCrit

(0.05)GermanyItalyFranceHollandLuxembouBelgiumEngPlaonrdtugalAustriaSwedenSwitzerlFinlandDenmark

Spain

NorwayIreland(Dcrit

[3]

=1.1598,

Absolute

distances,

Non

weightedresid01020406080100

120

140

160

180M

Do

2]

32[XdTHICKNES.M1

(PC),

UnTitled,

Work

setDModX,

Comp

2(Cum)247

14358

1522221322451111232028617

2730233436389444549116829

4

10

63D1Cri9t1(1090.05)505460

56263313433257

485626674436515861678047

556579672695737882

921008

0888

3

9110311196

11681

95

108118

12991802

1

2

111225163213353

64

7

47

78769884511101016213750

1231842813714615555 771276

8871909109499791

5

816115

91

613

693

104

119

130

1412141146725117925154716417410511412132711340114510411681562031761679117713

11914840451791

65

131135143

1

5

6

117713751883153

116166780

118824(Dcrit

[2]

=1.5130,

Normalized

distances,

Non

weightedresSimca-P

7.0

by

Umetri

AB

1998-08-17

13:39 Simca-P

7.0

by

Umetri

AB

1998-08-17

13:40OUTLIERS

–TOOLDMODERATEDETECTIONMODXCredit:

Henrik

Antti

/

UmeaUniversityy1y2u011tux1x3x2t(pca)t(pls-da)PLS-DAMetabonomic

spaceXDescribes

variation

in

NMR

dataDicriminant

spaceYDefining

the

known

classes0‘Inner

relation’Maximisingcorrelationbetweent

and

u?SplitthedataTrainingset-buildthemodelTestset-validatethemodel?Typicallyrequire>1/3dataintestset?AllmodelparametersoptimisedontrainingsetE.ponents,variablesselectedetc.?Goodnessoffitstatisticontest dataindicatespredictivequality ofthemodelMODEL(2)-VALIDATIONTRAIN/TEST12345678Training

setTest

set12345678Build

modelPredict

test

setMODEL

VALIDATION(3)

CROSS-?

General

principVlAe:LIDATIONRemove

some

data

Build

model

on

remainingdataPredict

removed

data

Repeat

until

all

samplesremoved

once?

Compute

predictions

&residuals

(eik)

foreach

sample

when

leftQ2?

oCuatlculate

PRESS

andfrom

all

residuals?

Can

do

this

for

X

or

Y12345678SamplesRoundsTraining

setTest

set13467852Predict

test

setBuild

model‘3-fold’

crossvalidatoin?

R2

&

Q2

plot

fromSIMCA-P

software?

R2

rises

with

eachcomponent?

Q2

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論