統(tǒng)計(jì)軟件實(shí)踐課程作業(yè)-R語言與統(tǒng)計(jì)分析_第1頁
統(tǒng)計(jì)軟件實(shí)踐課程作業(yè)-R語言與統(tǒng)計(jì)分析_第2頁
統(tǒng)計(jì)軟件實(shí)踐課程作業(yè)-R語言與統(tǒng)計(jì)分析_第3頁
統(tǒng)計(jì)軟件實(shí)踐課程作業(yè)-R語言與統(tǒng)計(jì)分析_第4頁
統(tǒng)計(jì)軟件實(shí)踐課程作業(yè)-R語言與統(tǒng)計(jì)分析_第5頁
已閱讀5頁,還剩95頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

統(tǒng)計(jì)軟件實(shí)踐課程作業(yè)

R語言與統(tǒng)計(jì)分析-湯銀才

姓名:XXX

班級:14經(jīng)管實(shí)驗(yàn)班

第三章概率與分布

§3.1隨機(jī)抽樣

從52張撲克牌中抽取4張對應(yīng)的R命令為:

>sample(l:52,4)

[1]4820724

拋一枚均勻的硬幣10次在R中可表示為:

>sample(c("H"z"T"),10,replace=T)

"H""H''"T”H-J-IIH-|-Hn-|-nH-|-II

擲一棵骰子10次可表示為:

>sample(l:6,10,replace=T)

[1]6421354444

一名外科醫(yī)生做手術(shù)成功的概率為0.90,那么他做10次手術(shù)在R中可以表示為:

>sample(c("成功","失敗"),10,replace=T,prob=c(0.9,0.1))

[1]"失敗""失敗""成功""成功""成功""成功""成功""成功""成功""成功"

若以1表示成功,0表示失敗,則上述命令可變?yōu)椋?/p>

>sample(c(l,0),10,replace=T,prob=c(0.9,0.1))

[1]1111111111

§3.2排列組合與概率的計(jì)算

抽取的4張是有次序的,因此使用排列來求解.所求的事件(記為A)概率為:

>l/prod(52:49)

[1]1.539077e-07

抽取的4張是沒有次序的,因此使用組合數(shù)來求解.所求的事件(記為B)概率為:

>l/choose(52,4)

[1]3,693785e-06

§3.4R中內(nèi)嵌的分布

1)查找分布的分位數(shù),用于計(jì)算假設(shè)檢驗(yàn)中分布的臨界值或置信區(qū)間的置信限.例如,顯

著性水平為5%的正態(tài)分布的雙側(cè)臨界值是:

>qnorm(0.025)

[1]-1.959964

>qnorm(0.975)

[1]1.959964

2)計(jì)算假設(shè)檢驗(yàn)的p值.比如自由度dfl的X23.84時(shí)的X2檢驗(yàn)的p值為:

>1-pchisq(3.84,1)

[1]0.05004352

而容量為14的雙邊t檢驗(yàn)的p值為:

>2*pt(-2.43,df=13)

[1]0,0303309

§3.5應(yīng)用:中心極限定理

limite.central()的定義:

>limite.central<-function(r=runif,distpar=m=.5,

+s=l/sqrt(12),

+n=c(l,3,10,30),N=1000){

+for(iinn){

+if(length(distpar)==2){

+x<-matrix(r(i*N,distpar[l],distpar[2]),nc=i)

+}

+else{

+x<-matrix(r(i*N/distpar),nc=i)

+}

+x<-(apply(xz1,sum)-i*m)/(sqrt(i)*s)

+hist(x,col='lightblue',probability=Xmain=paste("n="J)/

+ylim=c(0,max(.4,density(x)$y)))

+lines(density(x),co^'red',lwd=3)

+curve(dnorm(x),col='blue',lwd=3,lty=3,add=T)

+if(N>100){

+rug(sample(x,100))

+}

+else{

+rug(x)

+}

+}

+}

二項(xiàng)分布:b(10,0.1)

>op<-par(mfrow=c(2,2))

>limite.central(rbinom,distpar=c(10,0.1),m=l,s=0.9)

>par(op)

n=1n=3

泊松分布:pios(l)

>op<-par(mfrow=c(2,2))

>limite.central(rpoiszdistpar=lzm=lzs=l,n=c(3,10,30,50))

>par(op)

n=3n=10

均勻分布:unif(0,l)

>op<-par(mfrow=c(2,2))

>limite.central()

>par(op)

n=1n=3

指數(shù)分布:exp(l)

>op<-par(mfrow=c(2,2))

>limite.central(rexp,distpar=lzvo=l,s=l)

>par(op)

n=1n=3

XX

正態(tài)混合分布:

>op<-par(mfrow=c(2z2))

>mixn<-function(n,a=-l,b=l)

+{rnorm(n,sample(c(a,b),nzreplace=T))}

>limite.central(r=mixn,distpar=c(-3,3)/

+m=0,s=sqrt(10),n=c(l,2,3,10))

Par(op)

n=1n=2

X

習(xí)題3.3

3.3從正態(tài)分布N(100;100)中隨機(jī)產(chǎn)生1000個(gè)隨機(jī)數(shù),

1)作出這1000個(gè)正態(tài)隨機(jī)數(shù)的直方圖;

2)從這1000個(gè)隨機(jī)數(shù)中隨機(jī)有放回地抽取500個(gè),作出其直方圖;

3)比較它們的樣本均值與樣本方差.

>x=rnorm(1000,100,100)

>hist(x,prob=T,main="normalmu=0,sigma=l")

normalmu=0,sigma=1

>y=sample(x,500,replace=T)

_

>hist(y/prob=Gmain="normalmu=Ozsigma=l")

normalmu=0,sigma=1

>mean(x)

[1]99.39372

>mean(y)

[1]110.4469

>var(x)

[1]10362.39

>var(y)

[1]9457.672

其中x(1000個(gè)隨機(jī)數(shù))較y(500個(gè)隨機(jī)數(shù))樣本均值小,而樣本方差較大。

第四章探索性數(shù)據(jù)分析

§4.1常用分布的概率函數(shù)圖

二項(xiàng)分布

>n<-20

>p<-0.2

>k<-seq(O,n)

>plot(k/dbinom(k/n,p)/type='h'/

+main='Binomialdistribution,n=20,p=0.2',xlab='k')

Binomialdistribution,n=20,p=0.2

in

(o

d

c

-

x

)

E

O

U

_

q

p

s.

°iIIII

05101520

k

泊松分布:

>lambda<-4.0

>k<-seq(0,20)

"

>plot(k,dpois(k,lambda),type='h/

+main='Poissondistribution,lambda=5.5'/xlab='k')

Poissondistribution,lambda=5.5

s

o

E(

qp

E0

-0-

w-

os

pd

s

o

s,

o

05101520

k

幾何分布:

>p<-0.5

>k<-seq(0,10)

,

>plot(k/dgeom(k,p),type='h/

,,

+main='Geometricdistribution,p=0.5'/xlab=k)

Geometricdistribution,p=0.5

m

o

t

z

o

(

dco

Yo

E

O

&

P

CSOI

L

o

o

0246810

k

超級和分布

>N<-30

>M<-10

>n<-10

>k<-seqO10)

>plot(k/dhyper(kzN,M,n),type='h',

+main=Hypergeometricdistribution,

+N=30,M=10,n=10,,xlab='k,)

Hypergeometricdistribution,

N=30,M=10,n=10

負(fù)二項(xiàng)分布:

>n<-10

>p<-0.5

>k<-seq(0,40)

>plot(k,dnbinom(k,n,p),type='h',

+main='NegativeBinomialdistribution,

+n=10,p=0.5',xlab='k')

NegativeBinomialdistribution,

n=10,p=0.5

。8

d(

u-

x-

E)

UO

q_

pu

0

。

。

203040

k

正態(tài)分布:

>curve(dnorm(x70,l),xlim=c(-5,5),ylim=c(0,.8),

+co^'red*,lwd=2,lty=3)

>curve(dnorm(x,0/2),add=T,col='blue',lwd=2,lty=2)

>curve(dnorm(x,0,1/2),add”lwd=2,lty=l)

>title(main="Gaussiandistributions")

>legend(par('usr')[2],par('usr')[4],xjust=l,

,,

+c('sigma=l'z'sigma=2\sigma=l/2)/

+lwd=c(2,2,2),

+lty=c(3,2zl),

+col^Cred','blue',par("fg")))

Gaussiandistributions

9

(0

1

-0

X,

E)。

O」

PU

z

o

o

o

-4-2024

x

###############t分布###########

>curve(dt(xj),xlim=c(-3/3)/ylim=c(0,.4)z

+col='red',lwd=2,lty=l)

>curve(dt(x,2),add=Tcol='green'zlwd=2zlty=2)

>curve(dt(x,10),add=Tcol='orange'zlwd=2zlty=3)

>curvefdnormfx),add=Tlwd=3zlty=4)

>title(main="StudentTdistributions")

>legend(par('us「)[2],par('us「)[4],xjust=l,

+c('df=l'Jdf=2','df=10丁Gaussiandistribution'),

+lwd=c(2,2,2z2),

+lty=c(lz2,3,4)z

+col^Cred','blue','green,,par("fg")))

StudentTdistributions

x

##################X2分布############

>curve(dchisq(x,l),xlim=c(0,10)/ylim=c(0,.6)/col='red',lwd=2)

>curve(dchisq(x,2),add=Xco^'green',lwd=2)

>curve(dchisq(x,3),add=Tcol='blue',lwd=2)

>curve(dchisq(x,5),add=Xco^'orange*,lwd=2)

>abline(h=0,lty=3)

>abline(v=0Jty=3)

>title(main='ChisquareDistributions')

「)[⑷,

>legend(par('us2],par('usP)xjust=lz

+c('df=lLdf=2丁df=3丁df=5)

+lwd=3,lty=l,

+co^cCred','green*,'blue1,'orange')

ChisquareDistributions

X

F分布

>curve(df(x,l,l),xlim=c(0,2),ylim=c(0,.8),lty=l)

>curve(df(x,3,l),add=Tlwd=2Jty=2)

>curve(df(x,6,l),add=T,lwd=2,lty=3)

>curve(df(x,3/3),add=]col='red',lwd=3,lty=4)

>curve(df(x,3,6),add二工col='blue',lwd=3Jty=5)

>title(main="Fisher'sF")

>legend(par('usP)[2],par('usr')[4],xjust=l,

+丁df=(3,l)丁df=(6,l%

,

+'df=(3,3)',df=(3,6),),

+lwd=c(l,2,233),

+lty=c(l,2,3,4,5),

+col=c(par("fg"Lpar("fg"),par("fgnk'red;'blue'))

FishefsF

co

o

co

o

X0

o

o

0.00.51.01.52.0

x

#############對數(shù)正態(tài)分布##############

>curvefdlnormfx),xlim=c(-.2,5)/ylim=c(0,1.0),lwd=2)

>curve(dlnorm(x,0,3/2),add=Tzcol='blue‘,lwd=2,lty=2)

>curve(dlnorm(x,0zl/2)zadd=7;co^'orange',lwd=27lty=3)

>title(main="Lognormaldistributions")

>legend(par('usr')[2],par('usr')[4],xjust=l,

,

+c('sigma=l,'sigma=2','sigma=l/2')/

+lwd=c(2,2,2),

+lty=c(l,2,3),

+col=c(par("fg"),'blue','orange'))

Lognormaldistributions

o_

8

0

9

X

E

-0

O

U

_

Pw

o

N

o

0

0

柯西分布:

>curve(dcauchy(x),xlim=c(-5/5),ylim=c(0,.5),lwd=3)

>curve(dnorm(x)/add=Xcol='red'zlty=2)

>legend(par('usr')[2],par('usr')[4]zxjust=l,

+c(*Cauchydistribution*,'Gaussiandistribution'),

+lwd=c(3,l),

+lty=c(l,2),

+col=c(par("fgn),'red'))

9

0

N

o

(

xco

)o

A

l

pz

n

e

oo

p

o

o

o

-4-2024

x

威布爾分布

>curve(dexp(x),xlim=c(0,3),ylim=c(0,2))

>curve(dweibull(x,l),lty=3,lwd=3,add=T)

>curve(dweibull(x,2),colored',add=T)

>curve(dweibull(x,.8),co^'blue',add=T)

>title(main="WeibullProbabilityDistributionFunction")

>legend(par('us「)[2],par('us門⑷,xjust=l,

+c('Exponential','Weibull,shape=l',

+'Weibull,shape=2','Weibull,shape=.8'),

+lwd=c(lz3,l,l),

+lty=c(l,3,l,l),

n

+col=c(par("fg)/par("fg"),'red','blue'))

WeibullProbabilityDistributionFunction

o

z

9

b

(

)x0

xd-

ps

9

0

o

o

0.00.51.01.52.02.53.0

x

珈碼分布

>curve(dgamma(x,l,l),xlim=c(0,5),lwd=2zlty=l)

>curve(dgamma(x,2,l),add=Xcol='red',lwd=2zlty=2)

>curve)dgamma(x,3,l),add=Xcol='green'zlwd=2,lty=3)

>curve(dgamma(x,4,l),add=7;col='blue',lwd=2,lty=4)

>curve(dgamma(xz5zl)zadd=Xco^'orange',lwd=2zlty=5)

>title(main="Gammadistributions")

>legend(par('usr)[2],par(*usr')[4],xjust=lz

+c('k=l(Exponentialdistribution),,

+k2丁k=3丁k=4丁k=5'),

+lwd=c(2,2,2,2,2),

+lty=c(l/2,3,4,5),

+col^parCfg*),'red','green','blue','orange'))

Gammadistributions

0

1

8

0

(

二9

l。

x

)

E

E

E

e9

P6。

N

d

o

CD

012345

x

貝塔分布

>curve(dbeta(x,l,l),xlim=c(0,l),ylim=c(0,4))

>curve(dbeta(x,3,l),add=Tcol='green')

>curve(dbeta(x,3,2),add=Tlty=2,lwd=2)

>curve)dbeta(x,4,2)zadd=Xlty=2zlwd=2,col='blue')

>curve(dbeta(x,2,3)/add二1lty=3,lwd=3,col='red')

>curve(dbeta(x,4,3),add=Xlty=3,lwd=3,col='orange')

>title(main="Betadistributions")

>legend(par('usF)[l],par('usr')[4],xjust=0,

+c('(l,l);'(3,1);'(3,2),,

+'(4,2)'z'(2,3)','(4,3),),

+lwd=c(l,lz2,2,3,3),

+gc(l,l,2,2,3,3),

+col^parCfg*),'green;par('fg'),

,,

+blue'zred'/'orange'))

Betadistributions

(

L

-

L

X

)

E

n

D

o.o0.20.40.60.81.0

§4.2直方圖與密度函數(shù)的估計(jì)

例4.2.1從二項(xiàng)分布binom(100,0.9)中抽取容量為N=100000的樣本.試作出它的直方圖及

核密度估計(jì)曲線。

>N<-100000

>n<-100

>p<-.9

>x<-rbinom(N,n,p)

>hist(x,

+xlim=c(min(x),max(x)),probability^

+nclass=max(x)-min(x)+l,col='lightblue',

+main='Binomialdistribution,n=100,p=.5')

>>lines(density(xzbw=l),col='red'zlwd=3)

Binomialdistribution,n=100,p=.5

例4.2.2從負(fù)二項(xiàng)分布nbinom(10,0.25)中抽取容量為N=100000的樣本.試作出它的直方

圖及核密度估計(jì)曲線。

>x<-rnbinom(N,10,.25)

>hist(x,

+xlim=c(min(x)/max(x)),probabilityt]

+nclass=max(x)-min(x)+l,col='lightblue',

+main='Negativebinomialdistribution,n=10zp=.25')

>lines(density(x,bw=l),col='red'zlwd=3)

Negativebinomialdistribution,n=10,p=.25

§4.3單組數(shù)據(jù)的描述性統(tǒng)計(jì)分析

直方圖

>library(DAAG)

>data(possum)

>fpossum<-possum[possum$sex=="f"z]

>par(mfrow=c(l,2))

>library(DAAG)

>data(possum)

>fpossum<-possum[possum$sex==,'f',]

>par(mfrow=c(1,2))

>attach(fpossum)

>hist(totlngth,breaks=72.5+(0:5)*5,

ylim=c(0,22),xlab=ntotallength",

main=uA:Breaksat72.5,77.5…”)

>hist(totlngth,breaks=75+(0:5)*5,

ylim=c(0,22),xlab=ntotallength,1,

main=",B:Breaksat75,80…”)

莖葉圖

>stem(fpossum$totlngth)

Thedecimalpointisatthe|

7410

76

78

80|05

82I0500

84|05005

86105505

88|0005500005555

9015550055

92|000

94105

96|5

框須圖

>library(DAAG)

>data(possum)

>fpossum<-possum[possum$sex=="f',]

>boxplot(fpossum$totlngth)

正態(tài)性檢驗(yàn)

>qqnorm(fpossum$totlngth,

main="NormalityCheckviaQQPlot")

>qqline(fpossum$totlngth,colored')

>dens<-density(totlngth)

>xlim<-range(dens$x);ylim<-range(dens$y)

>par(mfrow=c(1,2))

>hist(totlngth,breaks=72.5+(0:5)*5,

xlim=xlim,ylim=ylim,

probability=T,xlab=ntotallength",

main=,'A:Breaksat72.5,77.5…”)

>lines(dens,col=par('fg,),Ity=2)

>m<-mean(totlngth)

>s<-sd(totlngth)

>curve(dnorm(x,m,s),col='red\add=T)

>hist(totlngth,breaks=75+(0:5)*5,

xlim=xlim,ylim=ylim,

probability=T,xlab="totallength",

main=nB:Breaksat75,80.,)

>lines(dens,col=par('fg,),lty=2)

>m<-mean(totlngth)

>s<-sd(totlngth)

>curve(dnorm(x,m,s),col='red\add=T)

>x<-sort(totlngth)

>n<-length(x)

>y<-(l:n)/n

>m<-mean(totlngth)

>s<-sd(totlngth)

>plot(x,y,type='s',main="empiricalcdfof")

>curve(pnorm(x,m,s),col=,red\lwd=2,add=T)

總體描述

>summary(fpossum$totlngth)

Min.1stQu.MedianMean3rdQu.Max.

75.0085.2588.5087.9190.5096.50

>mean(fpossum$totlngth)

[1]87.90698

五數(shù)及樣本分位數(shù)概括

>fivenum(fpossum$totlngth)

[1]75.0085.2588.5090.5096.50

>quantile(fpossum$totlngth)

0%25%50%75%100%

75.0085.2588.5090.5096.50

>quantile(fpossum$totlngth,prob=c(0.25,0.5,0.75))

25%50%75%

85.2588.5090.50

>median(fpossum$totlngth)

[1]88.5

>max(fpossum$totlngth)

[1]96.5

>min(fpossum$totlngth)

[1]75

離差的概括

>max(fpossum$totlngth)-min(fpossum$totlngth)

[1]21.5

>IQR(fpossum$totlngth)

[1]5.25

>sd(fpossum$totlngth)

[1]4,182241

>sd(fpossum$totlngth)A2

[1]17.49114

>var(fpossum$totlngth)

var(fpossum$totlngth)

>mad(fpossum$totlngth)

[1]3.7065

樣本偏度系數(shù)和峰度系數(shù)

>library(fBasics)

>skewness(fpossum$totlngth)

[1]-0.54838

>kurtosis(fpossum$totlngth)

[1]0.6170082

§4.4多組數(shù)據(jù)的描述性統(tǒng)計(jì)分析

散點(diǎn)圖

例4.4.1在R的程序包DAAG中有數(shù)據(jù)集cars,使用下邊的命令得到數(shù)據(jù)集。

>library(DAAG)

>data(cars)

>cars

speeddist

I42

2410

374

482493

4924120

502585

我們希望估計(jì)速度(speed)與終止距離(dist)之間的關(guān)系,先考查它們之間的散點(diǎn)圖,命令:

>plot(cars$dist~cars$speed,

xlab="Speed(mph)"?

ylab="Stoppingdistance(ft)”)

>lines(lowess(cars$speed,cars$dist),lwd=2)

進(jìn)一步,通過函數(shù)rug()可以在橫軸和縱軸上標(biāo)明數(shù)據(jù)的具體的位置:

>rug(side=2,jitter(cars$dist,20))

>rug(side=l,jitter(cars$speed,5))

我們也可以在數(shù)軸兩邊加上單變量的箱形圖:

>op<-par()

>layout(matrix(c(2,1,0,3),2,2,byrow=T),c(l,6),c(4,l))

>par(mar=c(l,l,5,2))

>plot(cars$dist~cars$speed,

xlab=n,ylab二",las=1)

>rug(side=1,jitter(cars$speed,5))

>rug(side=2,jitter(cars$dist,20))

>title(main="carsdata")

>par(mar=c(l,2,5,l))

>boxplot(cars$dist,axes=F)

>title(ylab='Stoppingdistance(ft)\line=0)

>par(mar=c(5,1,1,2))

>boxplot(cars$speed,horizontal=T,axes=F)

>litle(xlab='Speed(mph)\line=l)

>par(op)

等高線圖

>library(chplot)

>data(hdr)

>x<-hdrSage

>y<-log(hdr$income)

>plot(x,y)

>library(MASS)

>z<-kde2d(x,y)

>contour(z,col="red",drawlabels=FALSE,

main=''Densityestimation:contourplot")

三維透視圖

>persp(z,main="Densityestimation:perspectiveplot'1)

數(shù)據(jù)的變換

例4.4.2首先調(diào)出數(shù)據(jù)集Animal

>library(MASS)

>data(Animals)

>Animals

bodybrain

Mountainbeaver1.3508.1

Cow465.000423.0

Greywolf36.330119.5

Goat27.660115.0

Guineapig1.0405.5

Dipliodocus11700.00050.0

Pig192.000180.0

我們在R中畫兩個(gè)圖,一個(gè)使用原始的數(shù)據(jù),另一個(gè)對原來的數(shù)值取對數(shù):

>par(mfrow=c(1,2))

>plot(brain-body,data=Animals)

>plot(log(brain)-log(body),data=Animals)

多組數(shù)據(jù)

例4.4.3

>n<-10

>d<-data.frame(y1=abs(rnorm(n)),

y2=abs(rnorm(n)),

y3=abs(rnorm(n)),

y4=abs(rnorm(n)),

y5=abs(rnorm(n))

)

散點(diǎn)圖

>plot(d)

矩陣圖

>matplot(d,type=T,ylab=main="Matplot")

框須圖

>boxplot(d)

多組數(shù)據(jù)的描述性統(tǒng)計(jì)

>state.x77

PopulationIncomeIlliteracyLifeExp...

Alabama361536242.169.05...

Alaska36563151.569.31...

Arizona221245301.870.55...

Wisconsin458944680.772.48...

Wyoming37645660.670.29...

>summary(state.x77)

PopulationIncomeIlliteracyLifeExp

Min.:365Min.:3098Min.:0.500Min.:67.96

1stQu.:10801stQu.:39931stQu.:0.6251stQu.:70.12

Median:2839Median:4519Median:0.950Median:70.67

Mean:4246Mean:4436Mean:1.170Mean:70.88

3rdQu.:49693rdQu.:48143rdQu.:1.5753rdQu.:71.89

Max.:21198Max.:6315Max.:2.800Max.:73.60

MurderHSGradFrostArea

Min.:1.400Min.:37.80Min.:0.00Min.:1049

1stQu.:4.3501stQu.:48.051stQu.:66.251stQu.:36985

Median:6.850Median:53.25Median:114.50Median:54277

Mean:7.378Mean:53.11Mean:104.46Mean:70736

3rdQu.:10.6753rdQu.:59.153rdQu.:139.753rdQu.:81163

Max.:15.100Max.:67.30Max.:188.00Max.:566432

>aggregate(state.x77,list(Region=state.region),mean)

RegionPopulationIncomeIlliteracyLifeExp...

1Northeast5495.1114570.2221.00000071.26444...

2South4208.1254011.9381.73750069.70625...

3NorthCentral4803.0004611.0830.70000071.76667...

4West2915.3084702.6151.02307771.23462...

RegionColdPopulationIncomeIlliteracyLifeExp...

1NortheastFALSE8802.80004780.4001.180000071.12800...

2SouthFALSE4208.12504011.9381.737500069.70625...

3NorthCentralFALSE7233.83334633.3330.783333370.95667...

4WestFALSE4582.57144550.1431.257142971.70000...

5NortheastTRUE1360.50004307.5000.775000071.43500...

6NorthCentralTRUE2372.16674588.8330.616666772.57667...

7WestTRUE970.16674880.5000.750000070.69167...

標(biāo)準(zhǔn)差與協(xié)方差陣的計(jì)算

>options(digits=3)

>sd(state.x77)

PopulationIncomeIlliteracyLifeExpMurderHSGradFrostArea

4464.49614.470.611.343.698.0851.9885327.30

函數(shù)var()應(yīng)用在多組數(shù)據(jù)中計(jì)算的是協(xié)方差陣:

>var(state.x77)

PopulationIncomeIlliteracyLifeExp...

Population19931684571230292.868-4.08e+02...

Income571230377573-163.7022.81e+02...

Illiteracy293-1640.372-4.82e-01...

LifeExp-408281-0.4821.80e+00...

Murder5664-5221.582-3.87e+00...

HSGrad-35523077-3.2356.31e+00...

Frost-770827228-21.2901.83e+01...

Area8587917190490144018.337-1.23e+04...

>aggregate(state.x77,list(Region=state.region),sd)

RegionPopulationIncomeIlliteracyLifeExp...

1Northeast60805590.2780.744...

2South27806050.5521.022...

3NorthCentral37032830.1411.037...

4West55796640.6081.352...

相關(guān)系數(shù)的計(jì)算

例如,我們計(jì)算下面二個(gè)向量x與y之間的三個(gè)相關(guān)系數(shù):

>xv-c(44.4,45.9,46.0,46.5,46.7,47,48.7,49.2,60.1)

>y<-c(2.6,10.1,11.5,30.0,32.6,50.0,55.2,85.8,86.8)

>cor(x,y)

[1]0.768587

>cor(x,y,method=,,spearman")

[1]1

>cor(x,y,method=nkendair'))

[1]1

分組圖形

>data(cuckoos)

>cuckoos

lengthbreadthspeciesid

121.716.1meadow.pipit21

222.617.0meadow.pipit22

11820.815.9wren236

11921.216.0wren237

12021.016.0wren238

>coplot(length?breadth|species)

>data(cuckoos)

>attach(cuckoos)

>length.mp<-length[species==nmeadow.pipitn]

>length.tp<-length[species==',tree.pipitu]

>length.hs<-length[species=="hedge.sparrow"]

>length.r<-length[species==urobin"]

>length.pw<-length[species==,,pied.wagtail"]

>length.w<-length[species==*'wren*']

>par(mfrow=c(3,2))

>hist(length.mp,breaks=6,probability=T,

+xlim=c(l9,25),ylim=c(0,1),main=HU,col=6)

>hist(length.tp,breaks=6,probability=T,

>length.mp<-length[species==,,meadow.pipitn]

>length.tp<-length[species=="tree.pipitu]

>length.hs<-length[species==,,hedge.sparrow"]

>length.r<-length[species==urobin"]

>length.pw<-length[species==Hpied.wagtail"]

>length.w<-length[species=="wren"]

>par(mfrow=c(3,2))

>hist(length.mp,breaks=6,probability=T,

+xlim=c(l9,25),ylim=c(0,l),main=u,,,col=6)

>hist(length.tp,breaks=6,probability=T,

+xlim=c(l9,25),ylim=c(0,1),main=,,u,col=6)

>hist(length.hs,breaks=6,probability=T,

+xlim=c(l9,25),ylim=c(0,1),main=,,u,col=6)

>hist(length.r,breaks=6,probability=T,

xlim=c(19,25),ylim=c(0,1),main="M,col=6)

>hist(length.pw,breaks=6,probability=T,

+xlim=c(l9,25),ylim=c(0,l),main=,",,col=6)

>hist(length.w,breaks=6,probability=T,

+xlim=c(l9,25),ylim=c(0,1),main=,",,col=6)

>par(infrow=c(1,1))

>hists<-function(x,y,...){

+y<-factor(y)

+n<-length(levels(y))

+op<-par(mfcol=c(nzl),mar=c(2,4,l,l))

+b<-hist(xzplot=F)$breaks

+for(Iinlevels(y)){

+hist(x[y==l],breaks=b,probability^ylim=c(0,1.0),

+main=""/ylab=Lco^lightblue',xlab="",...)

+points(density(x[y==l]),type=T,lwd=3,col-red*)

+}

+par(op)

+}

>hists(cuckoos$length,cuckoos$species)

>histogram(-length|species,data=cuckoos)

/v

>boxplot(lengthspecies/data=cuckoos,xlab="lengthofegg”,horizontal=TRUE)

>slripchart(cuckoos$length-cuckoos$species,method="jitter")

>densityplot(-length|species,data=cuckoos)

§4.5分類數(shù)據(jù)的描述性統(tǒng)計(jì)分析

列聯(lián)表的制作

例4.5.1為考查眼睛的顏色(Eye)與頭發(fā)的顏色(Hair)之間的關(guān)系,收集了下面的一組數(shù)據(jù)

>Eye.Hair<-matrix(c(68,20,15,5,119,84,54,29,

26,17,14,14,7,94,10,16),nrow=4,byrow=T)

>colnames(Eye.Hair)<-c("Brown",nBluen,"HazeF',^Green*')

>rownames(Eye.Hair)<-c(,,Black'V'Brownu;'RedH,nBlondn)

>Eye.Hair

例4.5.2數(shù)據(jù)包ISwR中的數(shù)據(jù)集juul中含有三個(gè)分類變量:sex,tanner,menarche.則我們可

以得到下面的一些列表:

>table(sex)

>table(sex,menarche)

>table(menarche,tanner)

在實(shí)際使用時(shí)常需要按列聯(lián)表中某個(gè)屬性(因子)求和,稱之為邊際列表.除了使用前

面已經(jīng)提到的函數(shù)apply()外,更為方便的是使用函數(shù)margin.tabie().例如,對于數(shù)據(jù)

Eye.Hair,我們有:

>margin.table(Eye.Hair,1)

BlackBrownRedBlond

10828671127

>margin.table(Eye.Hair,2)

BrownBlueHazelGreen

2202159364

###冊##頻率列聯(lián)表########相##

>prop.table(Eye.Hair,1)

BrownBlueHazelGreen

Black0.630.20.140.05

Brown0.420.30.190.10

Red0.370.20.200.20

Blond0.060.70.080.13

>prop.table(Eye.Hair,1)*100

BrownBlueHazelGreen

Black6319145

Brown42291910

Red37242020

Blond674813

>Eye.Hair/sum(Eye.Hair)

BrownBlueHazelGreen

Black0.110.030.030.008

Brown0.200.140.090.049

Red0.040.030.020.024

Blond0.010.160.020.027

>data(HairEyeColor)

>a<-as.table(apply(HairEyeColor,c(1,2),sum))

>barplot(a,legend.text=attr(a,"dimnames'^SHair)

>dotchart(Eye.Hair)

習(xí)題

4.7假定某校100名女生的血清總蛋白含量(g/L)服從均值為75,標(biāo)準(zhǔn)差為3,并假定數(shù)據(jù)由下

面的命令產(chǎn)生

>options(digits=4)

>rnorm(100,75,9)

根據(jù)產(chǎn)生的數(shù)據(jù)

1)計(jì)算樣本均值、方差、標(biāo)準(zhǔn)差、極差、四分位極差、變異系數(shù)、偏度、峰度和五數(shù)概括;

2)畫出直方圖、核密度估計(jì)曲線、經(jīng)驗(yàn)分布圖和QQ圖;

3)畫出莖葉圖、框須圖.

(1)

>options(digits=4)

>a=rnorm(100,75,9)

>mean(a)

[1]75.24

>var(a)

[1]65.57

>sd(a)

[1]8.098

>range(a)

[1]53.2890.61

>summary(a)

Min.1stQu.MedianMean3rdQu.Max.

53.369.875.275.282.090.6

>IQR(a)

[1]12.27

>library(moments)

>skewness(a)

>fivenum(a)

[1]53.2869.7375.2582.1390.61

(2)

直方圖及核密度估計(jì)曲線

normalmu=0,sigma=1

a

QQ圖

NormalQ-QPlot

0

S8

U

g

o

Z0

Q.E

B

S

90

-2-1012

TheoreticalQuantiles

(3)

莖葉圖

>stem(a)

Thedecimalpointis1digit(s)totherightofthe|

5|3

5|79

6|0023444

6|56666778899999

7|0000111122222222344444

7|555556666777777888889999

8|00112223333444444

8|555667899

9|0011

框須圖

第五章參數(shù)估計(jì)

§5.1矩法估計(jì)和極大似然估計(jì)

例5.1.2對某個(gè)籃球運(yùn)動(dòng)員記錄其在一次比賽中投籃命中與否,觀測數(shù)據(jù)如下:

11O1OO1O111O11O1OO1O1O1OO11O11O1

編寫相應(yīng)的R函數(shù)估計(jì)這個(gè)籃球運(yùn)動(dòng)員投籃的成敗比。

>X<-c(l,L0,L0,0,VU,1,1,0,1,1,0,100,1,0,1,0,100,1,1,0,1,1,0,1)

>x

[1]11010010111011010010101001101101

>theta<-mean(X)

>theta

[1]0.5625

>t=theta/(l-theta)

>t

[1]1.285714

例5.1.4下面的觀測值為來自指數(shù)分布的一個(gè)樣本:

0.591327540.128549350.469002280.298359800.243414620.065666370.40085536

2.996871230.052789120.09898594

我們來估計(jì)其參數(shù)人o

>X=c(0.59132754,0.12854935/0.46900228,0.29835980,0.24341462,0.06566637,0.40085536,2.99

687123,0.05278912,0.09898594)

>lambda=l/mean(X)

>lambda

[1]1.87062

>lambda=l/sd(x)

Errorinis.data.frame(x):找不到對象又

>lambda=l/sd(X)

>lambda

[1]1.131032

結(jié)論:

1)人的一階矩估計(jì)為1.87062,二階矩估計(jì)為1.1

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論