版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
Multivariate
Statistical
AnalysisIf
we
obtain
analytical
data
on
two
groups
of
samples
which
wesuspect
may
be
different,
can
we
determine
the
followinginformation?¢
Are
the
groups
different?¢
Detect
those
compounds
which
have
increased
or
decreased
inconcentration
in
each
group.¢
Detect
those
compounds
which
are
missing
from
each
group
and
thosewhich
are
unique
to
each
group.¢
These
are
the
compounds
which
contribute
to
the
variance
betweenthe
groupsThis
is
an
imposing
PROBLEM
if
we
try
to
employ
traditionalmethods
of
spectral
comparison. Thus
we
need
to
employ
a
data-mining
technique
to
aid
us.SOMETHOUGHTSON“OMICS”
STUDIES1
2
51
50
1
75
20
0
22
52
5
02
75
3
00
32
5
35
03
7
54
00
4
25
45
0
47
55
0
05
25
5
50
57
5
60
06
2
56
50
6
75
70
0
72
57
5
07
75
8
00
82
5
85
08
7
59
00
9
25%1:TOFMSE1.93e55__DDaayy22__RRaatt__880405_Day2_Rat_81582(9.712)Cm(67:1617)100x21
2
51
50
1
75
20
0
22
52
5
02
75
3
00
32
5
35
03
7
54
00
4
25
45
0
47
55
0
05
25
5
50
57
5
60
06
2
56
50
6
75
70
0
72
57
5
07
75
8
00
82
5
85
08
7
59
00
9
25%55__DDaayy22__RRaatt__880405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1
0
0x
2A
Manual
ApproachThink
about
doing
a
rigorous
comparison
of
justthese
4
spectra
(~2,000
masses
or
metabolites
in
each).1
2
51
50
1
75
20
0
22
52
5
02
75
3
00
32
5
35
03
7
54
00
4
25
45
0
47
55
0
05
25
5
50
57
5
60
06
2
56
50
6
75
70
0
72
57
5
07
75
8
00
82
5
85
08
7
59
00
9
25%5_Day2_Rat_80405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1
0
0x
21
2
51
50
1
75
20
0
22
52
5
02
75
3
00
32
5
35
03
7
54
00
4
25
45
0
47
55
0
05
25
5
50
57
5
60
06
2
56
50
6
75
70
0
72
57
5
07
75
8
00
82
5
85
08
7
59
00
9
25%5_Day2_Rat_80405_Day2_Rat_81582(9.712)Cm(67:1617)1:TOFMSE1.93e1
0
0x
2?Itispossibletominethedatausingmultivariatestatistics.UsingthisapproachweanalyzethegroupsusingGCorLC/MSandtabulatealltheobservedmassesandtheirchromatographicretentiontimeswithadvancedcomputationalmethods.Thesemass/retentiontimepairsbecomethevariablesusedforstatisticalanalysis.?Multivariatestatisticsthenallowsustoreducethousandsvariables(mass/retentiontimepairs)downtoasimple2orthreedimensionalmapwhichshowsthatthegroupsaredifferentandprovidesuswithalistofthevariableswhichcontributetothedifference.A
SOLUTIONTO
THEPROBLEMWHAT
HAVE
WE
JUSTDONE?Latitude
35°
38‘
31.5“
NorthLongitude
139°
45‘
7.3“
EastAltitude12
MMULTIVARIATE
STATISTICALANALYSIS?
Spectrum
(observation)becomes
a
point
in
PCAScores
plot?
Variables(m/z_RT)shown
in
PCALoadings
PlotUsing
plots
together
allows
trends
in
the
sample
spectra
to
beinterpreted
in
terms
of
m/z632143_185.0493_213.043Why
not
:Hierarchical
ClusteringHeat
makeANOVAT
tests?
With
MarkerLynx
it
is
possible
to
export
yourdata
to
any
statistical
program
you
like.WHY
CHOOSE
TO
USE
MULTIVARIATE
STATISTICS?WHY
USE
MULTIVARIATE
ANALYSIS??
Short
and
wide
data
setsFew
observations
(N)Many
variables
(K)Noisy
dataMissing
data/excluded
regionsMultiple
objectives?
ImplicationsHigh
degree
of
correlation
(Many
variables
are
related)Difficult
to
analyse
with
conventional
methods?
Require
methods
for
simplification
and
visualisation8KNPrinciple
Component
Analysis(PCA)A
multivariate
statistical
approach
thatfacilitates
the
identification
of
differencesorsimilarities
between
groupsData
tablevariable
spaceThe
whole
table
yields
a
swarm
ofpoints
in
variable
spacevar.3var.3Singleobjectinvariablespacevar.2var.2var.1var.1DATAPREPARATIONmeanvar.1Centering–movecentreofpointswarmtothevariableoriginvar.2var.3PRE-PROCESSING(CENTERING)CENTRING
&SCALINGvar.2var.3var.1Scaling–puteachvariableonanequalfootinge.g.makestandarddeviationsequal(nottheonlyway)var.1var.2var.3STEPBYPCA
THEORY
–STEPvar.1(i)var.2var.3ti1The
first
principal
component
(PC
)1is
set
to
describe
the
largest
variation
in
the
data,PC1
(t1p’1)
which
is
thesameas
the
direction
in
which
thepoints
spread
most
in
the
variable
spaceThe
Score
value
(ti1)
for
the
point
i
is
the
distancefrom
the
projection
of
the
point
on
the
1:stcomponent
to
the
origin.PC1
hence
is
the
first
latent
variable
in
a
newcoordinate
system
that
describes
the
variationin
the
data.STEPBYPCA
THEORY
–STEPvar.1var.2var.3PC1(i)PC2ti1ti2The
second
principal
component
(PC2)is
set
to
describe
the
largest
variation
in
the
data,Perpendicular
(orthogonal)
to
the
1:st
componentA
corresponding
loading
plot
describes
thevariables
relationshipsallows
interpretation
of
the
scores
plotbyshowing
which
variables
are
responsible
forsimilarities
and
differences
between
samples.TheperpendiculardistancefromtheobjecttotheprojectionontheplaneistheresidualofthetwoPCs.TwoPCsmakeaplane(window)intheK-dimensionalvariablespace.Thepointsareprojecteddownontotheplanewhichisliftedoutandviewedas
a
two
dimensional
plot.PCA
theory
–
step
by
step·,
=data
points;=
projection1x,1x22This
is
the
scores
plot
similarities
or
differencesbetweensamplescan
now
be
seen.x33PC1PC2SCORESPLOT
EXAMPLEShockcor
et
al,
2001,
Magnetic
Resonance
in
Chemistry,
39:559-565.THE
LOADINGSPLOTSThe
loading
(p)
is
described
as
the
cosine
ofthe
angle
between
the
original
variable
and
thePC.PC
2Loading(p):describedthevariationinthevariabledirectioni.e.similarity/dissimilaritybetweenvariables,andalsoexplainsthevariationinscores.Theloading(p)describestheoriginalvariablesimportanceforrespectivePC.ThisisthesameasthesimilarityindirectionbetweentheoriginalvariableandthePC.PC1Projection
of(rxt
,
m/xz
)px,1px,2I(rt1,
m/z1)sample
i1,22,2PC2p
=
cospi
2
=
1I(rt2,
m/z2)With
px,1
=
cos(
x,1)
and
px,2
=
cos
(x,2)and
x,1
:
anglebetween
axe
(rtx,
m/zx)
and
PC1and
x,2
:
anglebetween
axe
(rtx,
m/zx)
and
PC2var.12=
90oPC11=
0oLoadingsIflargestvariationcoincideswithvar.1,thvar.2 firstprincipalcomponentwillbeindirectionvar.1PC1scores=valuesofvar.1PC1loadings,p1=(1,0)Credit:
Henrik
Antti
/
Umea
UniversityExample
Loadings
Plot20INTERPRETATION
OF
PCAScoresObservations
(spectra)Trends,
patterns,groupsLoadingsVariables
(m/z)Correlation,
influencePC2PC1STRONGOUTLIERSCredit:
Henrik
Antti
/
Umea
University-2-123020]21
[t010t[1]THICKNES.M1
(PC),
UnTitled,
Work
setScores:
t[1]/t[2]4066
261489
4611112315484118358146
7565915315510
7
86
17
61
5
2
0
075
75859816400311910111995315311428352613226014896884521938721148119217757051265135778846713017375
9
96012608152961938128282799154321627171712174381691266913718416406248115021245464174329279357
6
3
1
17
12212
9 6184
4
53 4
761641253912812
170641184910520519681403
7450160313747273033
6925118391365103101
1849123541484
118
0Ellipse:
Hotelling
T2
(0.05)Simca-P
7.0
by
Umetri
AB
1998-08-17
12:09-4-22]
2[0t-7
-6
-5
-4
-3
-2
-1
0
1
245
673t[1]41
(PC),
UnTitled,
Work
setScores:
t[1]/t[2]GermanyFranceBelgiumPortugaAlustriaSwedenDenmarkFinlandNorwayItaly
SpainIreSlawLnuixtedmzbeorul
HollandEnglandEllipse:
Hotelling
T2
(0.05)Simca-P
7.0
by
Umetri
AB
1998-08-17
09:23STRONGDETOUTLIERECTIONHOTELLINGS–T2Hotelling"s
T2
is
a
multivariate
generalisation
of
Student"s
tEllipse
of
constant
T2
confidence
regionStrongoutliersNo
stroFOOnDS.Mgoutlier
sHotelling’s
ellipseCredit:
Henrik
Antti
/
UmeaUniversity?Moderateoutliers detectedonresiduals(DmodX)plot?Distributionofresiduals~Normal?Ftestspecifiescut- offate.g.99%confidenceMODERATEOUTLIERSCredit:
Henrik
Antti
/
Umea
UniversityNo
moderate
outliers
Some
moderate
outliersCritical
distance
is
derived
from
the
distribution
of
residu0.200.401.001.200.000
1 2
3 4
5
67
8 9
10
11
12
13
14
15
16
17M
0.60
Do]3[
0.80XdFOODS.M1
(PC),
UnTitled,
Work
setDModX,
Comp
3(Cum)DCrit
(0.05)GermanyItalyFranceHollandLuxembouBelgiumEngPlaonrdtugalAustriaSwedenSwitzerlFinlandDenmark
Spain
NorwayIreland(Dcrit
[3]
=1.1598,
Absolute
distances,
Non
weightedresid01020406080100
120
140
160
180M
Do
2]
32[XdTHICKNES.M1
(PC),
UnTitled,
Work
setDModX,
Comp
2(Cum)247
14358
1522221322451111232028617
2730233436389444549116829
4
10
63D1Cri9t1(1090.05)505460
56263313433257
485626674436515861678047
556579672695737882
921008
0888
3
9110311196
11681
95
108118
12991802
1
2
111225163213353
64
7
47
78769884511101016213750
1231842813714615555 771276
8871909109499791
5
816115
91
613
693
104
119
130
1412141146725117925154716417410511412132711340114510411681562031761679117713
11914840451791
65
131135143
1
5
6
117713751883153
116166780
118824(Dcrit
[2]
=1.5130,
Normalized
distances,
Non
weightedresSimca-P
7.0
by
Umetri
AB
1998-08-17
13:39 Simca-P
7.0
by
Umetri
AB
1998-08-17
13:40OUTLIERS
–TOOLDMODERATEDETECTIONMODXCredit:
Henrik
Antti
/
UmeaUniversityy1y2u011tux1x3x2t(pca)t(pls-da)PLS-DAMetabonomic
spaceXDescribes
variation
in
NMR
dataDicriminant
spaceYDefining
the
known
classes0‘Inner
relation’Maximisingcorrelationbetweent
and
u?SplitthedataTrainingset-buildthemodelTestset-validatethemodel?Typicallyrequire>1/3dataintestset?AllmodelparametersoptimisedontrainingsetE.ponents,variablesselectedetc.?Goodnessoffitstatisticontest dataindicatespredictivequality ofthemodelMODEL(2)-VALIDATIONTRAIN/TEST12345678Training
setTest
set12345678Build
modelPredict
test
setMODEL
VALIDATION(3)
–
CROSS-?
General
principVlAe:LIDATIONRemove
some
data
Build
model
on
remainingdataPredict
removed
data
Repeat
until
all
samplesremoved
once?
Compute
predictions
&residuals
(eik)
foreach
sample
when
leftQ2?
oCuatlculate
PRESS
andfrom
all
residuals?
Can
do
this
for
X
or
Y12345678SamplesRoundsTraining
setTest
set13467852Predict
test
setBuild
model‘3-fold’
crossvalidatoin?
R2
&
Q2
plot
fromSIMCA-P
software?
R2
rises
with
eachcomponent?
Q2
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 補(bǔ)償機(jī)構(gòu)支架課程設(shè)計(jì)
- 2025年度消防工程維保項(xiàng)目招標(biāo)合同范本3篇
- 二零二五年度二手車(chē)買(mǎi)賣(mài)合同模板(含糾紛解決機(jī)制)2篇
- 2025年度安全生產(chǎn)信息化平臺(tái)建設(shè)合同范本2篇
- 2025年度消防安全教育宣傳合同3篇
- 2025版建筑行業(yè)施工人員勞動(dòng)合同參考書(shū)2篇
- 熒光光譜課程設(shè)計(jì)
- 繪畫(huà)課程設(shè)計(jì)創(chuàng)意作品
- 2024版工業(yè)用變壓器采購(gòu)協(xié)議樣例版B版
- 2024房地產(chǎn)策劃代理合同
- 重慶市九龍坡區(qū)2023-2024學(xué)年高二年級(jí)上冊(cè)1月期末考試物理試題
- 風(fēng)能發(fā)電對(duì)養(yǎng)殖場(chǎng)溫濕度變化的影響
- 計(jì)算機(jī)應(yīng)用基礎(chǔ)
- 廠內(nèi)電動(dòng)車(chē)安全管理制度
- 綜合實(shí)踐項(xiàng)目(一)制作細(xì)胞模型課件-2024-2025學(xué)年人教版七年級(jí)生物學(xué)上冊(cè)
- 遼寧省2024年中考物理試題【附真題答案】
- 2024年甘肅省職業(yè)院校技能大賽中職教師組電子信息類(lèi)產(chǎn)品數(shù)字化設(shè)計(jì)與開(kāi)發(fā)賽項(xiàng)樣卷A
- 竣工決算工作底稿
- 爐省煤器改造更換施工方案
- 大學(xué)生心理健康(貴州大學(xué))智慧樹(shù)知到期末考試答案章節(jié)答案2024年貴州大學(xué)
- 佛山市2022-2023學(xué)年七年級(jí)上學(xué)期期末考試數(shù)學(xué)試題
評(píng)論
0/150
提交評(píng)論