R的學(xué)習(xí)文檔--R cookbook_第1頁
R的學(xué)習(xí)文檔--R cookbook_第2頁
R的學(xué)習(xí)文檔--R cookbook_第3頁
R的學(xué)習(xí)文檔--R cookbook_第4頁
R的學(xué)習(xí)文檔--R cookbook_第5頁
已閱讀5頁,還剩202頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、R cookbookCookbook for R: ://rcookbook/Welcome to the Cookbook for R (formerly namedR Cookbook). The goal of the cookbook is to provide solutions to common tasks and problems in analyzing data.Most of the code in these pages can be copied and pasted into the R command window if you wa

2、nt to see them in action.BasicsNumbersStringsFormulasData input and outputManipulating dataStatistical analysisGraphsScripts and functionsTools for experimentsOther useful referencesQuick-R- an excellent quick referenceR Reference card(PDF)R for SAS and SPSS usersThis site is created by Winston Chan

3、g. It is not related to Paul Teetors excellentR Cookbook. It was recently migrated from a MoinMoin wiki to a static site generated by Markdoc, and there may have been some errors in the translation. If you find any errors, please report them .BasicsR environmentInstalling and usin

4、g packagesR language basicsIndexing into a data structureGetting a subset of a data structureMaking a vector filled with valuesInformation about variablesWorking withNULL,NA, andNaNInstalling and using packagesProblemYou want to do install and use a package.SolutionIf you are using a GUI for R, ther

5、e is likely a menu-driven way of installing packages. This is how to do it from the command line:install.packages(reshape2)In each new R session where you use the package, you will have to load it:library(reshape2)If you use the package in a script, put this line in the script.To update all your ins

6、talled packages to the latest versions available:update.packages()If you are using R on Linux, some of the R packages may be installed at the system level by the root user, and cant be updated this way, since you wont haver permission to overwrite them.Indexing into a data structureTable of contents

7、ProblemYou want to get part of a data structure.SolutionElements from a vector, matrix, or data frame can be extracted using numeric indexing, or by using a boolean vector of the appropriate length.In many of the examples, below, there are multiple ways of doing the same thing.Indexing with numbers

8、and namesWith a vector:# A sample vectorv - c(1,4,4,3,2,2,3)vc(2,3,4)v2:4# 4 4 3vc(2,4,3)# 4 3 4With a data frame:# Create a sample data framedata 2# FALSE TRUE TRUE TRUE FALSE FALSE TRUEvv2v c(F,T,T,T,F,F,T)# 4 4 3 3With the data frame from above:# A boolean vector data$subject 3# TRUE TRUE FALSE F

9、ALSEdatadata$subject 3, datac(TRUE,TRUE,FALSE,FALSE), # subject sex size# 1 M 7# 2 F 6 # It is also possible to get the numeric indices of the TRUEswhich(data$subject 3)# 1 2Negative indexingUnlike in some other programming languages, when you use negative numbers for indexing in R, it doesnt mean t

10、o index backward from the end. Instead, it means todropthe element at that index, counting the usual way, from the beginning.# Heres the vector again.v# 1 4 4 3 2 2 3# Drop the first elementv-1# 4 4 3 2 2 3# Drop first threev-1:-3# 3 2 2 3# Drop just the last elementv-length(v)# 1 4 4 3 2 2Getting a

11、 subset of a data structureProblemYou want to do get a subset of the elements of a vector, matrix, or data frame.SolutionTo get a subset based on some conditional criterion, thesubset()function or indexing using square brackets can be used. In the examples here, both ways are shown.# A sample vector

12、v - c(1,4,4,3,2,2,3)subset(v, v3)vv3# 1 2 2# Another vectort - c(small, small, large, medium)# Remove small entriessubset(t, t!=small)tt!=small# large mediumOne important difference between the two methods is that you can assign values to elements with square bracket indexing, but you cannot withsub

13、set().vv3 - 9# 9 4 4 3 9 9 3subset(v, v3) - 9# Error in subset(v, v 3) - 9 : could not find function subset-With data frames:# A sample data framedata - read.table(header=T, text= subject sex size 1 M 7 2 F 6 3 F 9 4 M 11 )subset(data, subject 3)datadata$subject 3, # subject sex size# 1 M 7# 2 F 6#

14、Subset of particular rows and columnssubset(data, subject 3, select = -subject)subset(data, subject 3, select = c(sex,size)subset(data, subject 3, select = sex:size)datadata$subject 3, c(sex,size)# sex size# M 7# F 6# Logical AND of two conditionssubset(data, subject 3 & sex=M)datadata$subject 3 & d

15、ata$sex=M, # subject sex size# 1 M 7# Logical OR of two conditionssubset(data, subject 3 | sex=M)datadata$subject 3 )datalog2(data$size) 50, # subject sex size# 3 F 9# 4 M 11# Subset if elements are in another vectorsubset(data, subject %in% c(1,3)datadata$subject %in% c(1,3), # subject sex size# 1

16、M 7# 3 F 9Making a vector filled with valuesProblemYou want to create a vector with values already filled in.Solutionrep(1, 50)# 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1# 39 1 1 1 1 1 1 1 1 1 1 1 1rep(F, 20)# 1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALS

17、E FALSE FALSE FALSE# 13 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSErep(1:5, 4)# 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5rep(1:5, each=4)# 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5# Use it on a factorrep(factor(LETTERS1:3), 5)# A B C A B C A B C A B C A B C# Levels: A B CInformation about variablesT

18、able of contentsProblemYou want to find information about variables.SolutionHere are some sample variables to work with in the examples below:x - 6n - 1:4let - LETTERS1:4df - data.frame(n, let)Information about existence# List currently defined variablesls()# df let n x # Check if a variable named x

19、 existsexists(x)# TRUE# Check if y existsexists(y)# FALSE# Delete variable xrm(x)x# Error: object x not foundInformation about size/structure# Get information about structurestr(n)# int 1:4 1 2 3 4str(df)# data.frame: 4 obs. of 2 variables:# $ n : int 1 2 3 4# $ let: Factor w/ 4 levels A,B,C,D: 1 2

20、3 4# Get the length of a vectorlength(n)# 4# Length probably doesnt give us what we want here:length(df)# 2# Number of rowsnrow(df)# 4# Number of columnsncol(df)# 2# Get rows and columnsdim(df)# 4 2Working with NULL, NA, and NaNTable of contentsProblemYou want to properly handleNULL,NA, orNaNvalues.

21、SolutionSometimes your data will includeNULL,NA, orNaN. These work somewhat differently from normal values, and may require explicit testing.Here are some examples of comparisons with these values:x 5# logical(0)y 5# NAz 5# NAHeres how to test whether a variable has one of these values:is.null(x)# T

22、RUEis.na(y)# TRUEis.nan(z)# TRUENote thatNULLis different from the other two.NULLmeans that there is no value, whileNAandNaNmean that there is some value, although one that is perhaps not usable. Heres an illustration of the difference:# Is y null?is.null(y)# FALSE# Is x NA?is.na(x)# logical(0)# War

23、ning message:# In is.na(x) : is.na() applied to non-(list or vector) of type NULLIn the first case, it checks ifyisNULL, and the answer is no. In the second case, it tries to check ifxis NA, but there is no value to be checked.Ignoring bad values in vector summary functionsIf you run functions likem

24、ean()orsum()on a vector containingNAorNaN, they will returnNAandNaN, which is generally unhelpful, though this will alert you to the presence of the bad value. Many of these functions take the flagna.rm, which tells them to ignore these values.vy - c(1, 2, 3, NA, 5)# 1 2 3 NA 5mean(vy)# NAmean(vy, n

25、a.rm=TRUE)# 2.75vz - c(1, 2, 3, NaN, 5)# 1 2 3 NaN 5sum(vz)# NaNsum(vz, na.rm=TRUE)# 11# NULL isnt a problem, because it doesnt existvx - c(1, 2, 3, NULL, 5)# 1 2 3 5sum(vx)# 11Removing bad values from a vectorThese values can be removed from a vector by filtering usingis.na()oris.nan().vy# 1 2 3 NA

26、 5vy !is.na(vy) # 1 2 3 5vz# 1 2 3 NaN 5vz !is.nan(vz) # 1 2 3 5NotesThere are also the infinite numerical valuesInfand-Inf, and the associated functionsis.finite()andis.infinite().NumbersGenerating random numbersGenerating repeatable sequences of random numbersSaving the state of the random number

27、generatorRounding numbersComparing floating point numbersGenerating random numbersProblemYou want to generate random numbers.SolutionFor uniformly distributed (flat) random numbers, userunif(). By default, its range is from 0 to 1.runif(1)# 0.5581546# Get a vector of 4 numbersrunif(4)# 0.383330465 0

28、.005814167 0.879704937 0.873534007# Get a vector of 3 numbers from 0 to 100runif(3, min=0, max=100)# 78.09879 85.37001 15.13357# Get 3 integers from 0 to 100# Use max=101 because it will never actually equal 101floor(runif(3, min=0, max=101)# 40 59 64# This will do the same thingsample(1:100, 3, rep

29、lace=T)# To generate integers WITHOUT replacement:sample(1:100, 3, replace=F)To generate numbers from a normal distribution, usernorm(). By default the mean is 0 and the standard deviation is 1.rnorm(4)# 1.04043144 -1.02006411 1.97268110 0.02424849# Use a different mean and standard deviationrnorm(4

30、, mean=50, sd=10)# 30.29251 48.75306 51.08491 50.04595# To check that the distribution looks right, make a histogram of the numbersx - rnorm(400, mean=50, sd=10)hist(x)Generating repeatable sequences of random numbersProblemYou want to generate a sequence of random numbers, and then generate that sa

31、me sequence again later.SolutionUseset.seed(), and pass in a number as the seed.set.seed(423)runif(3)# 0.1089715 0.5973455 0.9726307set.seed(423)runif(3)# 0.1089715 0.5973455 0.9726307Saving the state of the random number generatorTable of contentsProblemYou want to save and restore the state of the

32、 random number generatorSolutionSave.Random.seedto another variable.# For this example, set the random seedset.seed(423)runif(3)# 0.1089715 0.5973455 0.9726307# Save the seedoldseed - .Random.seedrunif(3)# 0.7973768 0.2278427 0.5189830# Do some other stuff with RNG here, such as:# runif(30)# .# Rest

33、ore the seed.Random.seed - oldseed# Get the same random numbers as before, after saving the seedrunif(3)# 0.7973768 0.2278427 0.5189830If no random number generator has been used in your R session, the variable.Random.seedwill not exist. If you cannot be certain that an RNG has been used before atte

34、mpting to save, the seed, you should check for it before saving and restoring:oldseed - NULLif (exists(.Random.seed) oldseed - .Random.seed# Do some other stuff with RNG here, such as:# runif(30)# .if (!is.null(oldseed) .Random.seed - oldseedSaving and restoring the state of the RNG in functionsIf y

35、ou attempt to restore the state of the random number generator within a function by using.Random.seed - x, it will not work, because this operation changes alocalvariable named.Random.seed, instead of the variable in theglobalenvrionment.Here are two examples. What these functions are supposed to do

36、 is generate some random numbers, while leaving the state of the RNG unchanged.# This is the bad versionbad_rand_restore - function() if (exists(.Random.seed) oldseed - .Random.seed print(runif(3) if (exists(.Random.seed) .Random.seed - oldseed# This is the good versionrand_restore - function() if (

37、exists(.Random.seed) oldseed - get(.Random.seed, .GlobalEnv) print(runif(3) if (exists(.Random.seed) assign(.Random.seed, oldseed, .GlobalEnv)# The bad version changes the RNG state, so random numbers keep changingset.seed(423)bad_rand_restore()# 0.1089715 0.5973455 0.9726307bad_rand_restore()# 0.79

38、73768 0.2278427 0.5189830bad_rand_restore()# 0.6929255 0.8104453 0.1019465# The good version doesnt alter the RNG state, so random numbers stay the sameset.seed(423)rand_restore()# 0.1089715 0.5973455 0.9726307rand_restore()# 0.1089715 0.5973455 0.9726307rand_restore()# 0.1089715 0.5973455 0.9726307

39、Rounding numbersProblemYou want to round numbers.SolutionA short description of the solution.x - seq(-2.5, 2.5, by=.5)# -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5# Round to nearest, with .5 values rounded to even number.round(x)# -2 -2 -2 -1 0 0 0 1 2 2 2# Round upceiling(x)# -2 -2 -1 -1 0 0 1

40、 1 2 2 3# Round downfloor(x)# -3 -2 -2 -1 -1 0 0 1 1 2 2# Round toward zerotrunc(x)# -2 -2 -1 -1 0 0 0 1 1 2 2It is also possible to round to other values besides one:x - c(.001, .07, 1.2, 44.02, 738, 9927) # 0.001 0.070 1.200 44.020 738.000 9927.000# Round to one decimal placeround(x, digits=1)# 0.

41、0 0.1 1.2 44.0 738.0 9927.0# Round to tens placeround(x, digits=-1)# 0 0 0 40 740 9930# Round to nearest 5round(x/5)*5# 0 0 0 45 740 9925# Round to nearest .02round(x/.02)*.02# 0.00 0.08 1.20 44.02 738.00 9927.00Comparing floating point numbersProblemComparing floating point numbers does not always

42、work as you expect. For example:0.3 = 3*.1# FALSE(0.1 + 0.1 + 0.1) - 0.3# 5.551115e-17x - seq(0, 1, by=.1)# 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.010*x - round(10*x)# 1 0.000000e+00 0.000000e+00 0.000000e+00 4.440892e-16 0.000000e+00# 6 0.000000e+00 8.881784e-16 8.881784e-16 0.000000e+00 0.00000

43、0e+00#11 0.000000e+00SolutionThere is no universal solution, because this issue is inherent to the storage format for non-integer (floating point) numbers, in R and computers in general.Creating a formula from a stringProblemYou want to create a formula from a string.SolutionIt can be useful to crea

44、te a formula from a string. This often occurs in functions where the formula arguments are passed in as strings.In the most basic case, useas.formula():# This returns a string:y x1 + x2# y x1 + x2# This returns a formula:as.formula(y x1 + x2)# y x1 + x2Here is an example of how it might be used:# Th

45、ese are the variable names:measurevar - ygroupvars - c(x1,x2,x3)# This creates the appropriate string:paste(measurevar, paste(groupvars, collapse= + ), sep= )# y x1 + x2 + x3# This returns the formula:as.formula(paste(measurevar, paste(groupvars, collapse= + ), sep= )# y x1 + x2 + x3StringsSearching

46、 and replacing - grep, sub, gsubCreating strings from variables- sprintf, pasteCreating strings from variablesTable of contentsProblemYou want to do create a string from variables.SolutionThe two common ways of creating strings from variables are thepastefunction and thesprintffunction.pasteis more

47、useful for vectors, andsprintfis more useful for precise control of the output.Using paste()a - appleb - banana# Put a and b together, with a space in between:paste(a, b)# apple banana# With no space:paste(a, b, sep=)# applebanana# With a comma and space:paste(a, b, sep=, )# apple, banana# With a ve

48、ctord - c(fig, grapefruit, honeydew)# If the input is a vector, use collapse to put the elements together:paste(d, collapse=, )# fig, grapefruit, honeydew# If the input is a scalar and a vector, it puts the scalar with each# element of the vector, and returns a vector:paste(a, d)# apple fig apple gr

49、apefruit apple honeydew # Use sep and collapse:paste(a, d, sep=-, collapse=, )# apple-fig, apple-grapefruit, apple-honeydewUsing sprintf()Another way is to usesprintffunction. This is derived from the function of the same name in the C programming language.To substitute in a string or string variabl

50、e, use%s:a - stringsprintf(This is where a %s goes., a)# This is where a string goes.For integers, use%dor a variant:x - 8sprintf(Regular:%d, x)# Regular:8# Can print to take some number of characters, leading with spaces.sprintf(Leading spaces:%4d, x)# Leading spaces: 8# Can also lead with zeros in

51、stead.sprintf(Leading zeros:%04d, x)#Leading zeros:0008:For floating-point numbers, use%ffor standard notation, and%eor%Efor exponential notation. You can also use%gor%Gfor a smart formatter that automatically switches between the two formats, depending on where the significant digits are. The follo

52、wing examples are taken from the R help page for sprintf:sprintf(%f, pi) # 3.141593sprintf(%.3f, pi) # 3.142sprintf(%1.0f, pi) # 3sprintf(%5.1f, pi) # 3.1sprintf(%05.1f, pi) # 003.1sprintf(%+f, pi) # +3.141593sprintf(% f, pi) # 3.141593sprintf(%-10f, pi) # 3.141593 (left justified)sprintf(%e, pi) #3

53、.141593e+00sprintf(%E, pi) # 3.141593E+00sprintf(%g, pi) # 3.14159sprintf(%g, 1e6 * pi) # 3.14159e+06 (exponential)sprintf(%.9g, 1e6 * pi) # 3141592.65 (fixed)sprintf(%G, 1e-6 * pi) # 3.14159E-06In the%m.nfformat specification: Themrepresents the field width, which is theminimumnumber of characters

54、in the output string, and can be padded with leading spaces, or zeros if there is a zero in front ofm. Thenrepresents precision, which the number of digits after the decimal.Other miscellaneous things:sprintf(Substitute in multiple strings: %s %.5f, x, string2)# Substitute in multiple strings: strin

55、g string2# To print a percent sign, use %sprintf(A single percent sign here %)# A single percent sign here %Data input and outputLoading data from a fileLoading and storing data with the keyboard and clipboardRunning a scriptWriting data to a fileWriting text and output from analyses to a fileLoadin

56、g data from a fileTable of contentsProblemYou want to load data from a file.SolutionDelimited text filesThe simplest way to import data is to save it as a text file with delimiters such as tabs or commas (CSV).data - read.csv(datafile.csv)# Load a CSV file that doesnt have headersdata - read.csv(dat

57、afile-noheader.csv, header=FALSE)The functionread.table()is a more general function which allows you to set the delimiter, whether or not there are headers, whether strings are set off with quotes, and more. See?read.tablefor more information on the details.data - read.table(datafile-noheader.csv, h

58、eader=FALSE, sep= , # use t for tab-delimited files )Loading a file with a file chooserOn some platforms, usingfile.choose()will open a file chooser dialog window. On others, it will simply prompt the user to type in a filename.data - read.csv(file.choose()Treating strings as factors or charactersBy

59、 default, strings in the data are converted to factors. If you load the data below withread.csv, then all the text columns will be treated as factors, even though it might make more sense to treat some of them as strings. To do this, usestringsAsFactors=FALSE:data - read.csv(datafile.csv, stringsAsF

60、actors=FALSE)# You might have to convert some columns to factorsdata$Sex - factor(data$Sex)Another alternative is to load them as factors and convert some columns to characters:data - read.csv(datafile.csv)data$First - as.character(data$First)data$Last - as.character(data$Last)# Another method: conv

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論