




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、R cookbookCookbook for R: ://rcookbook/Welcome to the Cookbook for R (formerly namedR Cookbook). The goal of the cookbook is to provide solutions to common tasks and problems in analyzing data.Most of the code in these pages can be copied and pasted into the R command window if you wa
2、nt to see them in action.BasicsNumbersStringsFormulasData input and outputManipulating dataStatistical analysisGraphsScripts and functionsTools for experimentsOther useful referencesQuick-R- an excellent quick referenceR Reference card(PDF)R for SAS and SPSS usersThis site is created by Winston Chan
3、g. It is not related to Paul Teetors excellentR Cookbook. It was recently migrated from a MoinMoin wiki to a static site generated by Markdoc, and there may have been some errors in the translation. If you find any errors, please report them .BasicsR environmentInstalling and usin
4、g packagesR language basicsIndexing into a data structureGetting a subset of a data structureMaking a vector filled with valuesInformation about variablesWorking withNULL,NA, andNaNInstalling and using packagesProblemYou want to do install and use a package.SolutionIf you are using a GUI for R, ther
5、e is likely a menu-driven way of installing packages. This is how to do it from the command line:install.packages(reshape2)In each new R session where you use the package, you will have to load it:library(reshape2)If you use the package in a script, put this line in the script.To update all your ins
6、talled packages to the latest versions available:update.packages()If you are using R on Linux, some of the R packages may be installed at the system level by the root user, and cant be updated this way, since you wont haver permission to overwrite them.Indexing into a data structureTable of contents
7、ProblemYou want to get part of a data structure.SolutionElements from a vector, matrix, or data frame can be extracted using numeric indexing, or by using a boolean vector of the appropriate length.In many of the examples, below, there are multiple ways of doing the same thing.Indexing with numbers
8、and namesWith a vector:# A sample vectorv - c(1,4,4,3,2,2,3)vc(2,3,4)v2:4# 4 4 3vc(2,4,3)# 4 3 4With a data frame:# Create a sample data framedata 2# FALSE TRUE TRUE TRUE FALSE FALSE TRUEvv2v c(F,T,T,T,F,F,T)# 4 4 3 3With the data frame from above:# A boolean vector data$subject 3# TRUE TRUE FALSE F
9、ALSEdatadata$subject 3, datac(TRUE,TRUE,FALSE,FALSE), # subject sex size# 1 M 7# 2 F 6 # It is also possible to get the numeric indices of the TRUEswhich(data$subject 3)# 1 2Negative indexingUnlike in some other programming languages, when you use negative numbers for indexing in R, it doesnt mean t
10、o index backward from the end. Instead, it means todropthe element at that index, counting the usual way, from the beginning.# Heres the vector again.v# 1 4 4 3 2 2 3# Drop the first elementv-1# 4 4 3 2 2 3# Drop first threev-1:-3# 3 2 2 3# Drop just the last elementv-length(v)# 1 4 4 3 2 2Getting a
11、 subset of a data structureProblemYou want to do get a subset of the elements of a vector, matrix, or data frame.SolutionTo get a subset based on some conditional criterion, thesubset()function or indexing using square brackets can be used. In the examples here, both ways are shown.# A sample vector
12、v - c(1,4,4,3,2,2,3)subset(v, v3)vv3# 1 2 2# Another vectort - c(small, small, large, medium)# Remove small entriessubset(t, t!=small)tt!=small# large mediumOne important difference between the two methods is that you can assign values to elements with square bracket indexing, but you cannot withsub
13、set().vv3 - 9# 9 4 4 3 9 9 3subset(v, v3) - 9# Error in subset(v, v 3) - 9 : could not find function subset-With data frames:# A sample data framedata - read.table(header=T, text= subject sex size 1 M 7 2 F 6 3 F 9 4 M 11 )subset(data, subject 3)datadata$subject 3, # subject sex size# 1 M 7# 2 F 6#
14、Subset of particular rows and columnssubset(data, subject 3, select = -subject)subset(data, subject 3, select = c(sex,size)subset(data, subject 3, select = sex:size)datadata$subject 3, c(sex,size)# sex size# M 7# F 6# Logical AND of two conditionssubset(data, subject 3 & sex=M)datadata$subject 3 & d
15、ata$sex=M, # subject sex size# 1 M 7# Logical OR of two conditionssubset(data, subject 3 | sex=M)datadata$subject 3 )datalog2(data$size) 50, # subject sex size# 3 F 9# 4 M 11# Subset if elements are in another vectorsubset(data, subject %in% c(1,3)datadata$subject %in% c(1,3), # subject sex size# 1
16、M 7# 3 F 9Making a vector filled with valuesProblemYou want to create a vector with values already filled in.Solutionrep(1, 50)# 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1# 39 1 1 1 1 1 1 1 1 1 1 1 1rep(F, 20)# 1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALS
17、E FALSE FALSE FALSE# 13 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSErep(1:5, 4)# 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5rep(1:5, each=4)# 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5# Use it on a factorrep(factor(LETTERS1:3), 5)# A B C A B C A B C A B C A B C# Levels: A B CInformation about variablesT
18、able of contentsProblemYou want to find information about variables.SolutionHere are some sample variables to work with in the examples below:x - 6n - 1:4let - LETTERS1:4df - data.frame(n, let)Information about existence# List currently defined variablesls()# df let n x # Check if a variable named x
19、 existsexists(x)# TRUE# Check if y existsexists(y)# FALSE# Delete variable xrm(x)x# Error: object x not foundInformation about size/structure# Get information about structurestr(n)# int 1:4 1 2 3 4str(df)# data.frame: 4 obs. of 2 variables:# $ n : int 1 2 3 4# $ let: Factor w/ 4 levels A,B,C,D: 1 2
20、3 4# Get the length of a vectorlength(n)# 4# Length probably doesnt give us what we want here:length(df)# 2# Number of rowsnrow(df)# 4# Number of columnsncol(df)# 2# Get rows and columnsdim(df)# 4 2Working with NULL, NA, and NaNTable of contentsProblemYou want to properly handleNULL,NA, orNaNvalues.
21、SolutionSometimes your data will includeNULL,NA, orNaN. These work somewhat differently from normal values, and may require explicit testing.Here are some examples of comparisons with these values:x 5# logical(0)y 5# NAz 5# NAHeres how to test whether a variable has one of these values:is.null(x)# T
22、RUEis.na(y)# TRUEis.nan(z)# TRUENote thatNULLis different from the other two.NULLmeans that there is no value, whileNAandNaNmean that there is some value, although one that is perhaps not usable. Heres an illustration of the difference:# Is y null?is.null(y)# FALSE# Is x NA?is.na(x)# logical(0)# War
23、ning message:# In is.na(x) : is.na() applied to non-(list or vector) of type NULLIn the first case, it checks ifyisNULL, and the answer is no. In the second case, it tries to check ifxis NA, but there is no value to be checked.Ignoring bad values in vector summary functionsIf you run functions likem
24、ean()orsum()on a vector containingNAorNaN, they will returnNAandNaN, which is generally unhelpful, though this will alert you to the presence of the bad value. Many of these functions take the flagna.rm, which tells them to ignore these values.vy - c(1, 2, 3, NA, 5)# 1 2 3 NA 5mean(vy)# NAmean(vy, n
25、a.rm=TRUE)# 2.75vz - c(1, 2, 3, NaN, 5)# 1 2 3 NaN 5sum(vz)# NaNsum(vz, na.rm=TRUE)# 11# NULL isnt a problem, because it doesnt existvx - c(1, 2, 3, NULL, 5)# 1 2 3 5sum(vx)# 11Removing bad values from a vectorThese values can be removed from a vector by filtering usingis.na()oris.nan().vy# 1 2 3 NA
26、 5vy !is.na(vy) # 1 2 3 5vz# 1 2 3 NaN 5vz !is.nan(vz) # 1 2 3 5NotesThere are also the infinite numerical valuesInfand-Inf, and the associated functionsis.finite()andis.infinite().NumbersGenerating random numbersGenerating repeatable sequences of random numbersSaving the state of the random number
27、generatorRounding numbersComparing floating point numbersGenerating random numbersProblemYou want to generate random numbers.SolutionFor uniformly distributed (flat) random numbers, userunif(). By default, its range is from 0 to 1.runif(1)# 0.5581546# Get a vector of 4 numbersrunif(4)# 0.383330465 0
28、.005814167 0.879704937 0.873534007# Get a vector of 3 numbers from 0 to 100runif(3, min=0, max=100)# 78.09879 85.37001 15.13357# Get 3 integers from 0 to 100# Use max=101 because it will never actually equal 101floor(runif(3, min=0, max=101)# 40 59 64# This will do the same thingsample(1:100, 3, rep
29、lace=T)# To generate integers WITHOUT replacement:sample(1:100, 3, replace=F)To generate numbers from a normal distribution, usernorm(). By default the mean is 0 and the standard deviation is 1.rnorm(4)# 1.04043144 -1.02006411 1.97268110 0.02424849# Use a different mean and standard deviationrnorm(4
30、, mean=50, sd=10)# 30.29251 48.75306 51.08491 50.04595# To check that the distribution looks right, make a histogram of the numbersx - rnorm(400, mean=50, sd=10)hist(x)Generating repeatable sequences of random numbersProblemYou want to generate a sequence of random numbers, and then generate that sa
31、me sequence again later.SolutionUseset.seed(), and pass in a number as the seed.set.seed(423)runif(3)# 0.1089715 0.5973455 0.9726307set.seed(423)runif(3)# 0.1089715 0.5973455 0.9726307Saving the state of the random number generatorTable of contentsProblemYou want to save and restore the state of the
32、 random number generatorSolutionSave.Random.seedto another variable.# For this example, set the random seedset.seed(423)runif(3)# 0.1089715 0.5973455 0.9726307# Save the seedoldseed - .Random.seedrunif(3)# 0.7973768 0.2278427 0.5189830# Do some other stuff with RNG here, such as:# runif(30)# .# Rest
33、ore the seed.Random.seed - oldseed# Get the same random numbers as before, after saving the seedrunif(3)# 0.7973768 0.2278427 0.5189830If no random number generator has been used in your R session, the variable.Random.seedwill not exist. If you cannot be certain that an RNG has been used before atte
34、mpting to save, the seed, you should check for it before saving and restoring:oldseed - NULLif (exists(.Random.seed) oldseed - .Random.seed# Do some other stuff with RNG here, such as:# runif(30)# .if (!is.null(oldseed) .Random.seed - oldseedSaving and restoring the state of the RNG in functionsIf y
35、ou attempt to restore the state of the random number generator within a function by using.Random.seed - x, it will not work, because this operation changes alocalvariable named.Random.seed, instead of the variable in theglobalenvrionment.Here are two examples. What these functions are supposed to do
36、 is generate some random numbers, while leaving the state of the RNG unchanged.# This is the bad versionbad_rand_restore - function() if (exists(.Random.seed) oldseed - .Random.seed print(runif(3) if (exists(.Random.seed) .Random.seed - oldseed# This is the good versionrand_restore - function() if (
37、exists(.Random.seed) oldseed - get(.Random.seed, .GlobalEnv) print(runif(3) if (exists(.Random.seed) assign(.Random.seed, oldseed, .GlobalEnv)# The bad version changes the RNG state, so random numbers keep changingset.seed(423)bad_rand_restore()# 0.1089715 0.5973455 0.9726307bad_rand_restore()# 0.79
38、73768 0.2278427 0.5189830bad_rand_restore()# 0.6929255 0.8104453 0.1019465# The good version doesnt alter the RNG state, so random numbers stay the sameset.seed(423)rand_restore()# 0.1089715 0.5973455 0.9726307rand_restore()# 0.1089715 0.5973455 0.9726307rand_restore()# 0.1089715 0.5973455 0.9726307
39、Rounding numbersProblemYou want to round numbers.SolutionA short description of the solution.x - seq(-2.5, 2.5, by=.5)# -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5# Round to nearest, with .5 values rounded to even number.round(x)# -2 -2 -2 -1 0 0 0 1 2 2 2# Round upceiling(x)# -2 -2 -1 -1 0 0 1
40、 1 2 2 3# Round downfloor(x)# -3 -2 -2 -1 -1 0 0 1 1 2 2# Round toward zerotrunc(x)# -2 -2 -1 -1 0 0 0 1 1 2 2It is also possible to round to other values besides one:x - c(.001, .07, 1.2, 44.02, 738, 9927) # 0.001 0.070 1.200 44.020 738.000 9927.000# Round to one decimal placeround(x, digits=1)# 0.
41、0 0.1 1.2 44.0 738.0 9927.0# Round to tens placeround(x, digits=-1)# 0 0 0 40 740 9930# Round to nearest 5round(x/5)*5# 0 0 0 45 740 9925# Round to nearest .02round(x/.02)*.02# 0.00 0.08 1.20 44.02 738.00 9927.00Comparing floating point numbersProblemComparing floating point numbers does not always
42、work as you expect. For example:0.3 = 3*.1# FALSE(0.1 + 0.1 + 0.1) - 0.3# 5.551115e-17x - seq(0, 1, by=.1)# 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.010*x - round(10*x)# 1 0.000000e+00 0.000000e+00 0.000000e+00 4.440892e-16 0.000000e+00# 6 0.000000e+00 8.881784e-16 8.881784e-16 0.000000e+00 0.00000
43、0e+00#11 0.000000e+00SolutionThere is no universal solution, because this issue is inherent to the storage format for non-integer (floating point) numbers, in R and computers in general.Creating a formula from a stringProblemYou want to create a formula from a string.SolutionIt can be useful to crea
44、te a formula from a string. This often occurs in functions where the formula arguments are passed in as strings.In the most basic case, useas.formula():# This returns a string:y x1 + x2# y x1 + x2# This returns a formula:as.formula(y x1 + x2)# y x1 + x2Here is an example of how it might be used:# Th
45、ese are the variable names:measurevar - ygroupvars - c(x1,x2,x3)# This creates the appropriate string:paste(measurevar, paste(groupvars, collapse= + ), sep= )# y x1 + x2 + x3# This returns the formula:as.formula(paste(measurevar, paste(groupvars, collapse= + ), sep= )# y x1 + x2 + x3StringsSearching
46、 and replacing - grep, sub, gsubCreating strings from variables- sprintf, pasteCreating strings from variablesTable of contentsProblemYou want to do create a string from variables.SolutionThe two common ways of creating strings from variables are thepastefunction and thesprintffunction.pasteis more
47、useful for vectors, andsprintfis more useful for precise control of the output.Using paste()a - appleb - banana# Put a and b together, with a space in between:paste(a, b)# apple banana# With no space:paste(a, b, sep=)# applebanana# With a comma and space:paste(a, b, sep=, )# apple, banana# With a ve
48、ctord - c(fig, grapefruit, honeydew)# If the input is a vector, use collapse to put the elements together:paste(d, collapse=, )# fig, grapefruit, honeydew# If the input is a scalar and a vector, it puts the scalar with each# element of the vector, and returns a vector:paste(a, d)# apple fig apple gr
49、apefruit apple honeydew # Use sep and collapse:paste(a, d, sep=-, collapse=, )# apple-fig, apple-grapefruit, apple-honeydewUsing sprintf()Another way is to usesprintffunction. This is derived from the function of the same name in the C programming language.To substitute in a string or string variabl
50、e, use%s:a - stringsprintf(This is where a %s goes., a)# This is where a string goes.For integers, use%dor a variant:x - 8sprintf(Regular:%d, x)# Regular:8# Can print to take some number of characters, leading with spaces.sprintf(Leading spaces:%4d, x)# Leading spaces: 8# Can also lead with zeros in
51、stead.sprintf(Leading zeros:%04d, x)#Leading zeros:0008:For floating-point numbers, use%ffor standard notation, and%eor%Efor exponential notation. You can also use%gor%Gfor a smart formatter that automatically switches between the two formats, depending on where the significant digits are. The follo
52、wing examples are taken from the R help page for sprintf:sprintf(%f, pi) # 3.141593sprintf(%.3f, pi) # 3.142sprintf(%1.0f, pi) # 3sprintf(%5.1f, pi) # 3.1sprintf(%05.1f, pi) # 003.1sprintf(%+f, pi) # +3.141593sprintf(% f, pi) # 3.141593sprintf(%-10f, pi) # 3.141593 (left justified)sprintf(%e, pi) #3
53、.141593e+00sprintf(%E, pi) # 3.141593E+00sprintf(%g, pi) # 3.14159sprintf(%g, 1e6 * pi) # 3.14159e+06 (exponential)sprintf(%.9g, 1e6 * pi) # 3141592.65 (fixed)sprintf(%G, 1e-6 * pi) # 3.14159E-06In the%m.nfformat specification: Themrepresents the field width, which is theminimumnumber of characters
54、in the output string, and can be padded with leading spaces, or zeros if there is a zero in front ofm. Thenrepresents precision, which the number of digits after the decimal.Other miscellaneous things:sprintf(Substitute in multiple strings: %s %.5f, x, string2)# Substitute in multiple strings: strin
55、g string2# To print a percent sign, use %sprintf(A single percent sign here %)# A single percent sign here %Data input and outputLoading data from a fileLoading and storing data with the keyboard and clipboardRunning a scriptWriting data to a fileWriting text and output from analyses to a fileLoadin
56、g data from a fileTable of contentsProblemYou want to load data from a file.SolutionDelimited text filesThe simplest way to import data is to save it as a text file with delimiters such as tabs or commas (CSV).data - read.csv(datafile.csv)# Load a CSV file that doesnt have headersdata - read.csv(dat
57、afile-noheader.csv, header=FALSE)The functionread.table()is a more general function which allows you to set the delimiter, whether or not there are headers, whether strings are set off with quotes, and more. See?read.tablefor more information on the details.data - read.table(datafile-noheader.csv, h
58、eader=FALSE, sep= , # use t for tab-delimited files )Loading a file with a file chooserOn some platforms, usingfile.choose()will open a file chooser dialog window. On others, it will simply prompt the user to type in a filename.data - read.csv(file.choose()Treating strings as factors or charactersBy
59、 default, strings in the data are converted to factors. If you load the data below withread.csv, then all the text columns will be treated as factors, even though it might make more sense to treat some of them as strings. To do this, usestringsAsFactors=FALSE:data - read.csv(datafile.csv, stringsAsF
60、actors=FALSE)# You might have to convert some columns to factorsdata$Sex - factor(data$Sex)Another alternative is to load them as factors and convert some columns to characters:data - read.csv(datafile.csv)data$First - as.character(data$First)data$Last - as.character(data$Last)# Another method: conv
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年廣西理工職業(yè)技術(shù)學(xué)院單招職業(yè)技能考試題庫必考題
- 船舶油污損害民事責(zé)任主體研究報告
- 農(nóng)膜材料性能評估-深度研究
- 活動策劃案例分析-深度研究
- 可行性研究報告的主要內(nèi)容
- 預(yù)訓(xùn)練模型優(yōu)化-深度研究
- 單細(xì)胞免疫細(xì)胞鑒定-深度研究
- 在線學(xué)習(xí)平臺的用戶行為分析-深度研究
- 范式級數(shù)據(jù)存儲-深度研究
- 移動端文檔數(shù)據(jù)安全-深度研究
- 2023年江蘇省蘇州市吳江區(qū)中考一模數(shù)學(xué)試題
- 經(jīng)顱磁刺激技術(shù)操作指南
- 房地產(chǎn)市場報告 -【年報】2023年全國總結(jié)與展望-易居克而瑞
- 智能制造概論 課件全套 第1-6章 智能制造的發(fā)展歷程- 智能制造的應(yīng)用
- 中國旅游地理(高職)全套教學(xué)課件
- 護(hù)理安全警示案例及分析
- 客戶分析數(shù)據(jù)分析報告
- 學(xué)校管理與小學(xué)教學(xué)質(zhì)量提升方案
- 燃?xì)庠畹臓I銷方案和策略
- 核心素養(yǎng)背景下小學(xué)科學(xué)“教-學(xué)-評”一體化探究
- 學(xué)習(xí)委員培訓(xùn)課件
評論
0/150
提交評論