# Introduction to R

## Getting started with R and R-Studio

1. Install R (a statistical programming language) from http://cran.us.r-project.org. Follow the instructions noting that there are different versions depending on your platform (Mac or Windows).
2. Install RStudio (a free integrated development environment) from https://www.rstudio.com/products/rstudio/download/. Choose the free “Desktop” version.
3. Create a new folder on your Desktop.  For now, suppose you call it Biocore1. This is where you will place your R files, data files, and also files with computer output, including plots.
4. Start R-Studio by clicking on it (Do not start R directly). You will notice that the screen is roughly divided into 4 parts.  You will be entering all of your commands on the two parts on the left. You can think of the top left portion as an editor.  The bottom left part, “Console”, contains the commands that are actually run by R. (If you were to open the R program by itself—and not R-studio—you would essentially be seeing the lower left—or console—portion of the R-studio screen. However, R-studio provides many advantages and you should use that for your work.)
5. Experiment a little using just the console. E.g., type in 3 + 6 and return and observe that the answer given is 9.
6. Create a new file.   Using the menu, choose “File > New File > R Script”.  Save it with a name that conveys some meaning to you—such as rtest.R (the “.R” at the end is necessary and indicates that your file is a R file extension).  Suppose now that you wish to analyze some data from a study examining the relationship between heart rate (in beats per minute) and environmental temperature (in degrees C) for the common grass frog. You have observations on temperature and heart rate for 8 frogs.
7. Start by entering the following code in the upper left quadrant.  Enter the temperature data as a vector with the 8 values (2, 4, 6, 8, 10, 12, 14, 16) as the components of the vector.  Then use the “mean” function to find the mean of the 8 temperature values. (Make sure that you follow each line with a carriage return.)
```    temperature <- c(2, 4, 6, 8, 10, 12, 14, 16)
mean(temperature)
```

Note:  Most serious R programmers like to use <- as the assignment operator as we did above with temperature.  However typing = will work in exactly the same way.

8. To run this code, highlight these two lines with the cursor and then click on the “Run” button found just above the material you have entered.   When you do this, you will see the commands being repeated in the lower left (console) along with the result.  (Here, the result is the mean environmental temperature of 9 degrees Celsius).
9. To create a plot of heart rate versus temperature, you will need to enter in the 8 heart rate values (for each environmental temperature measured) as another vector. To create this second vector and then a plot of heart rate vs. temperature, return to the upper left quadrant and type in the following. (Again, remember the carriage return).
```    heart_rate <- c(4, 11, 9, 13, 21, 24, 29, 27)
plot(x=temperature,y=heart_rate)```
10. Run this code (highlight these two lines with the cursor and then click on the “Run” button).  You will see the commands in the console and the plot will appear in the lower right part of the screen (You may need to “zoom” or increase the size of that quadrant in order to see the plot). Does the plot make sense based on the data entered? What is the general relationship between environmental temperature and heart rate in frogs?
11. As a scientist, you should always think carefully about the range of axis values, and labeling these values clearly. To do this, create a plot that includes labels (with units) and also includes a zero value for both temperature and heart rate as the origin by entering the following commands and running the code.
```    plot(x=temperature,y=heart_rate,
main= "heart rate versus temperature ",
xlab= "temperature (degrees C) ",
ylab= "heart rate (beats per minute) ",
xlim=c(0,20), ylim=c(0,35))```

Does the new plot clearly indicate the variable names and units on each axis? If the title were not needed for this plot, (e.g., titles are not needed for figures in papers, but titles are needed for poster figures), how might you change the above code to remove the title?}

12. Save the file again using File > Save (It is good practice to regularly save a file on which you are working during your R-session).   —–  You could actually just type in your commands into the console and run them line-by-line.  However, in this case they will not be saved.  Except for an occasional quick test—or accessing the help—it is strongly suggested that you enter all your commands in the upper left.
13. Suppose you wish to take output—including plots—from R and place it in a document.  One way to do so is the following: Open a new document—perhaps in WORD or a PowerPoint slideshow. Copy and paste the material from the console directly into the document (we strongly suggest that you save R output using “courier” or “new courier” font to maintain alignment). For copying plots, go to the menu immediately above the plot. Click on Export.  Then click on Copy to Clipboard.  From here, you can paste into your document. Note that the file is essentially a picture, so edits cannot be made on it.  To edit your graphs, you will need to modify RStudio commands in the upper left quadrant, highlight them, and hit Run.
14. Suppose you come back at some later date and wish to access the file “rtest.R” again.  Using the menu, choose “File > Open File > … and then navigate to rtest.R

## The R Help Function

RStudio includes a help function.  The “Help” is, perhaps, most easily accessed by entering commands directly in the console.  If you know the name of a particular command (say “mean”), type `?mean` in the upper left quadrant, highlight, and then hit Run.  (Or, type ?mean in the lower left console, then the Enter carriage twice.) The help information will appear on the right bottom.  If you do not know the name of a specific command but know the “general statistical area” in which you are interested (say “variance”), type `??variance` in the lower left console quadrant and then return.  (Note the two question marks here.) On the right bottom you will see a list of possible commands that match the “general statistical area”.  In this case  “`stats::var`” is the command you want.  (You may need to click on several of the possibilities in order to find the command you need.)  The “`stats`” corresponds to the library in which the “`var`” command can be found.  (The “stats” library is automatically loaded onto R.)  You can then click on “`stats::var`” to obtain information on this particular command.

As an example of using the help command for a specific command (plot), consider typing in the lower left console quadrant:
`?plot`.

On the lower right, you will see the provided information, “Generic X-Y Plotting“.  Here we present a portion of this information.

`plot(x, y, ...)`

***This is the basic command.  The “…” allows for subcommands that can control legends, axis labels, etc.

x     the coordinates of points in the plot. Alternatively, a single plotting structure, function or any R object with a plot method can be provided.

y     the y coordinates of points in the plot, optional if x is an appropriate structure.

…     arguments to be passed to methods, such as graphical parameters (see par). Many methods will accept the following arguments:

```main     an overall title for the plot: see title.
sub      a sub title for the plot: see title.
xlab     a title for the x axis: see title.
ylab     a title for the y axis: see title.```

Clicking on “graphical parameters” will provide information that can let you adjust axes, choose plotting symbols, etc.  Clicking on “title” will provide information related to labeling”.

For two examples of these ideas, see the sets of commands for certain plots used in the  Biocore Statistics Primer.  The given link takes you to the figure. Clicking the link on the figure takes you to the R commands. (a)  Finch upper beak length vs height: scatter plot;  (b) Beta galactosidase activity example: raw data.

## Importing Data Into R

As a scientist, you will likely need to use R to analyze large data files with many columns (variables) and rows (observations).  Suppose that the data have previously been entered into Excel.  If you save the Excel file using the “.csv” format, this data file can be read directly into R.   As an example, consider the data on thistle density (data set A) used initially in Chapter 3.  Suppose the data have been stored in the data file, thistleA.csv.  This file  contains data with 11 rows and 2 columns.  The number of thistles in a quadrat for 11 burned and 11 unburned quadrats are in columns 1 and 2 respectively. The commands used for importing the data into R are the following:

```thistleA_data <- read.csv("thistleA.csv",header=T)
> burned <- thistleA_data[,1]
> unburned <- thistleA_data[,2]```

You can then use other R commands to make plots and perform analyses with these data. (As a specific example see the commands for a side-by-side stem leaf plot with these data.)

When importing a data file into R, you will need to “let R know” where the file is located. Suppose the file is in the Biocore1 folder. You will need to go to the menu and click on “Session > Set Working Directory > Choose Directory”. You will then need to specify the folder Biocore1. At this point you can read the file by using a command such as `read.csv`.

# R commands for Chapter 1

##### Simple random sample for burn and non-burn treatments:  R commands.     Section 1.2:     subsection on Simple Random Sample

***First you need to number the 8 plots.  (For example, number the plots in the top row from 1 to 4 and in the bottom row from 5 to 8.)  You will then use a randomization procedure to reorder the numbers  1 through 8 with the idea that the first 4 numbers in the reorder will be the plot numbers  for the burn treatment and the next 4 for the non-burn treatment.

plots <- 1:8
sample(plots, 8)

***In a possible run with this command, the result was:   7  5  3  8  1  4  6  2 .  Thus, plots 7, 5, 3, and 8 are assigned to the burn treatment and the other four plots to the non-burn. This leads to the figure shown.  It is very important to use the first randomization that you obtain. If you continue to randomize until you obtain a randomization that “you like”, you run a real risk that your sample is not an appropriate one due to bias.

# R commands for Chapter 2

##### Lupine height: 1st stem leaf display.     Fig 2.2.1

lupine_height <-c (73, 83, 79, 56, 87, 65, 47, 96, 90, 78, 63, 75, 52, 103, 64, 89, 76, 68)
stem(lupine_height, scale=2)

*** The default stem command has scale=1.  This produces the plot shown in Fig 2.2.2.  Here we use scale=2 to provide what we believe is a more useful plot.

##### Lupine height: 2nd stem leaf display.    Fig 2.2.2

stem(lupine_height)

##### Lupine height: histogram     Fig 2.2.3

hist(lupine_height, breaks=c(40, 50, 60, 70, 80, 90, 100, 110)+0.01)

*** The given commands provides the default histogram.  For the purposes of this primer, we made some modifications in the size of the text and the margins.  The commands we actually used are the following.

par(mai=c(1, 1, .8, .4) + .02)
hist(lupine_height, main=””, xlab=”lupine height (cm)”, ylab=”number of plants”,
breaks=c(40, 50, 60, 70, 80, 90, 100, 110)+0.1, cex.axis=1.8, cex.lab=2.0)

*** Here the “cex” parameters control the relative sizes of the numbers on the axis and the axis labels.  The “par” command adjusts the margins so that the enlarged axis labels are completely visible.

##### Lupine height: box plot     Fig 2.2.4

boxplot(lupine.height)

*** The figure in the primer uses the following command to add a label and increase the size of the numbers on the axis.

boxplot(lupine_height, ylab=”lupine height (cm)”,
cex.axis=1.8, cex.lab=2.0)

##### Finch beak length: histogram (raw data).    Fig 2.2.5

finch_beak_height <- finch_data[,2]
hist(finch_beak_height, col=”red”, border=”black”, breaks=c(seq( 7.275, 9.515, 0.32)))

***The choice of the values within the  ‘breaks’  statement was made to produce a “nicely appearing” histogram.    —  In order to adjust the relative sizes of the numbers on the axis, the axis labels, and the plot title, with the proper margins, the following commands were used for the primer.

par(mai=c(1, 1, .8, .4) + .02)
finch_beak_height <- finch_data[,2]
hist(finch_beak_height, main=””, xlab=”finch beak height (mm)”, col=”red”,
border=”black”, breaks=c(seq( 7.275, 9.515, 0.32)), cex.axis=1.8, cex.lab=2.0, cex.main=2.2)

##### Finch beak length: histogram (logged data)    Fig 2.2.6

log_finch_beak_height <- log(finch_beak_height)
hist(log_finch_beak_height, col=”red”, border=”black”, breaks=c(seq( 1.981, 2.256, 0.025)))

***Similar modifications as made in the previous histogram were made for this plot in the primer.

##### Finch upper beak length vs beak height: scatter plot.    Fig 2.2.7

finch_upper_beak_length <- finch_data[,3]
plot(finch_beak_height, finch_upper_beak_length)

***Please note that the commands given here assume that the first two commands from Fig 2.2.5 have already been entered.   Also, to produce the plot that appears in the primer, the following commands were used to replace the ‘plot” command, above.

par(mai=c(1, 1, .8, .4) + .02)
plot(jitter(finch_beak_height), jitter(finch_upper_beak_length), col=”blue”,
cex.axis=1.8, cex.lab=2.0, cex.main=2.2,cex=1.5,lwd=2,
xlab=”finch beak height (mm)”, ylab=”finch upper beak length (mm)”)

##### Leaf type counts: “ordinary” bar plot.    Fig 2.3.1

num <- c(14, 9, 7, 15)
barplot(num, names.arg=c(“toothed”, “entire”, “divided”, “lobed”), col= “blue”, xlab=”leaf type”, ylab=”number of leaves”)

##### Leaf type counts: stacked bar plot.    Fig 2.3.2

number_leaves <- matrix(nrow=4, ncol=1)
number_leaves[1,]=14
number_leaves[2,]=9
number_leaves[3,]=7
number_leaves[4,]=15
barplot(height=number_leaves,
legend.text=c(“toothed”, “entire”, “divided”,
“lobed”),  xlim=c(.2,4.0), ylab=(“number of leaves”),
args.legend=list(x=”bottom”, cex=2.0),
col=c(“red”, “green”, “blue”, “yellow”),
cex.axis=1.8, cex.lab=2.0, cex.names=3.0)

# R commands for Chapter 3

qt(0.925,6)

##### Thistle Counts:  side-by-side stem leaf plots.   Fig 3.2.3

***This plot can be produced directly by R as follows.

library(aplpack)
burned <- c(29, 17 ,33 ,25 ,6 ,12 ,20 ,36 ,39, 10, 4)
unburned <- c(24, 30, 13, 17, 3, 11, 37, 6, 8, 22, 16)
stem.leaf.backback(burned,unburned)

***Note that Fig 3.2.2 cannot be produced directly by R.  It can easily be constructed “by hand”.  Alternatively, one could do the following:

burned <- c(29, 17 ,33 ,25 ,6 ,12 ,20 ,36 ,39, 10, 4)
unburned <- c(24, 30, 13, 17, 3, 11, 37, 6, 8, 22, 16)
stem(burned)
stem(unburned)

*** Then you would need to use computer-specific commands to place them side-by-side.

##### *** for Data Set A

burneda <- c(29, 17 ,33 ,25 ,6 ,12 ,20 ,36 ,39, 10, 4)
unburneda <- c(24, 30, 13, 17, 3, 11, 37, 6, 8, 22, 16)
mburn <- mean(burneda)
munb <- mean(unburneda)
se.burn <- sd(burneda)/sqrt(length(burneda))
se.unb <- sd(unburneda)/sqrt(length(unburneda))
means <- c(mburn, munb)
sterrors <- c(se.burn, se.unb)
x <- barplot(means, names=c(“Burned”,”Unburned”),
main=”Number of Thistles in Burned & Unburned Plots: Data Set A”,
col=c(“grey”,”blue”), ylab=”Number of Thistles”,
ylim=c(0,30))
arrows(x0=x, y0=means-1.58*sterrors,
x1=x, y1=means+1.58*sterrors,
code=3,angle=90,length=.1,lwd=2)

##### *** for Data Set B

burnedb <- c(22,27,17,20,19,21,18,25,15,22,25)
unburnedb <- c(14,24,16,18,15,12,21,20,15,13,19)
mburn <- mean(burnedb)
munb <- mean(unburnedb)
se.burn <- sd(burnedb)/sqrt(length(burnedb))
se.unb <- sd(unburnedb)/sqrt(length(unburnedb))
means <- c(mburn, munb)
sterrors <- c(se.burn, se.unb)
x <- barplot(means, names=c(“Burned”,”Unburned”),
main=”Number of Thistles in Burned & Unburned Plots: Data Set B”,
col=c(“grey”,”blue”), ylab=”Number of Thistles”,
ylim=c(0,30))
arrows(x0=x, y0=means-1.58*sterrors,
x1=x, y1=means+1.58*sterrors,
code=3,angle=90,length=.1,lwd=2)

Hypothesis test for lupine example:  finding a p-value given a T-value (two-tailed).     Section 3.4:     subsection on Performing the Test

2*(1-pt(2.648, 17))

***We want the probability that T is greater than 2.648.  The pt command gives the probability that T is less than 2.648.  Thus, we need to use “1-pt”.  The factor “2” denotes that this is a two-tailed test.  Note that the pt command requires both a T-value and the df.

Hypothesis test for pollution example:  finding a p-value given a T-value (1-tailed)     Section 3.4:    subsection on The Single Sample t-test with a one-sided alternative

1-pt(2.02, 9)

# R commands for Chapter 4

##### Heart-rate comparison: bar plot for two-independent sample design.    Fig 4.1.1

resting <- c(70, 71, 68, 77, 56, 67, 64, 68, 73, 60, 64)
stair <- c(96, 91, 83, 104, 72, 89, 87, 86, 99, 91, 77)
mrest <- mean(resting)
mstair <- mean(stair)
se.rest <- sd(resting)/sqrt(length(resting))
se.stair <- sd(stair)/sqrt(length(stair))
means <- c(mrest,mstair)
sterrors <- c(se.rest,se.stair)
x <- barplot(means, names=c(“resting group”,”stair-stepping group”),
main=”Mean Heart Rates for Resting Group & Stair-Stepping Group:
\n Two independent Sample Design”,
col=c(“grey”,”blue”), ylab=”Mean Heart Rate (bpm)”,
ylim=c(0,100))
arrows(x0=x, y0=means-sterrors,
x1=x, y1=means+sterrors,
code=3,angle=90,length=.1,lwd=2)

##### Heart-rate comparison: bar plot for individuals in paired sample design.    Fig 4.1.2

resting <- c(70, 71, 68, 77, 56, 67, 64, 68, 73, 60, 64)
stair <- c(96, 91, 83, 104, 72, 89, 87, 86, 99, 91, 77)
n <- length(resting)
m <- matrix(data=c(resting, stair), nrow=2, ncol=n, byrow=TRUE)
barplot(height=m, beside=TRUE, col=c(“grey”, “blue”),
main=”Heart Rate Before & After Stair Stepping: \n Paired design”,
xlab=”Subject”, ylab=”Individual Heart Rate (bpm)”,
legend.text=c(“resting HR”, “stair HR”),
names.arg=1:n, args.legend=list(x=”bottom”))

##### Heart-rate comparison: bar plot for mean in paired sample design.    Fig 4.1.3

resting <- c(70, 71, 68, 77, 56, 67, 64, 68, 73, 60, 64)
stair <- c(96, 91, 83, 104, 72, 89, 87, 86, 99, 91, 77)
diff <- stair – resting
mdiff <- mean(diff)
se.diff <- sd(diff)/sqrt(length(diff))
means <- c(mdiff)
sterrors <- se.diff
x <- barplot(means, names=c(“difference”),
main=”Mean of Heart Rate Differences \n Between After & Before Stair Stepping: \n Paired design”,
col=c(“black”), cex.main=1.7, cex.lab=1.6, cex.names=1.6,
ylab=”Mean Difference in Heart Rates (bpm)”, ylim=c(0,40))
arrows(x0=x, y0=means-sterrors,
x1=x, y1=means+sterrors,
code=3,angle=90,length=.1,lwd=2)

##### Hypothesis test for first thistle example:  2 independent-sample T-test   Section 4.2:  Step 4

burned1 <- c(29, 17 ,33 ,25 ,6 ,12 ,20 ,36 ,39, 10, 4)
unburned1 <- c(24, 30, 13, 17, 3, 11, 37, 6, 8, 22, 16)
t.test(burned1, unburned1, var.equal=TRUE)

***The default for the 2 independent-sample T-test with the “t.test” command  —  without including var.equal=TRUE  —   is an approximate procedure that does not require the variances of the two groups to be equal.  Unless the sample sizes are quite unequal — and the group variances differ substantially — we recommend the “pooled” approach using var.equal =TRUE.

##### Number of bacteria on bean plants: side-by-side stem leaf plots   Fig 4.3.1 and Fig 4.3.2

*** Here we followed the procedure of creating separate stem-leaf plots for the two bean species and putting them together “by hand”.

##### Hypothesis test for heart rate example (paired):  Paired T-test     Sec 4.4

***There are two ways to perform this test with R.  The first is to use the t-test command formally invoking the pairing.

resting <- c(70, 71, 68, 77, 56, 67, 64, 68, 73, 60, 64)
stair <- c(96, 91, 83, 104, 72, 89, 87, 86, 99, 91, 77)
t.test(stair, resting, paired=TRUE)

***The second is to calculate the difference and then to recognize that we perform a one-sample t-test on the difference.

diff=stair-resting
t.test(diff, mu=0)

***For the first approach, we would have obtained identical results had we written the t.test command as   t.test(stair, resting, mu=0,  paired=TRUE).   For the second, we could have used the command  —  t.test(diff)  —  to produce the identical result.  The  mu=0  is implicit.  However, it is often good practice to explicitly include this term.

##### Hypothesis test for germination example:  (a) finding a p-value given a $X^2$-value; (b) conducting chi-square test      Sec 4.5

(a)  1 – pchisq(3.271, 1)

***This is similar to the “pt” command.  However, since the $X^2$-value can never be negative, we do not need to multiply by 2 as we did with pt.

(b)  germdat <- c(19, 81, 30, 70)
germmat <- matrix(data=germdat, nrow=2, ncol=2)
chisq.test(germmat, correct=FALSE)

***In order to best use the chisq.test command, the data should be placed in a matrix.  The default chisq.test command uses something called the “continuity correction”.   We recommend not using this correction.  Thus, we include the “correct=FALSE” statement.

# R commands for Chapter 5

##### Reaction time example:  Box plots    Fig 5.2.1

par(mai=c(1,  1,  .8,  .4) + .02)
times1 <- c(17, 23, 14, 20)
times2 <- c(19, 26, 18, 21)
times3 <- c(28, 22, 31, 26)
agegroup <- c(“age: 18-25”,  “age: 26-33”,  “age: 34-41″)
boxplot(times1, times2, times3, names=agegroup, horizontal=FALSE,  main=”reaction time in milliseconds — by age group”, ylab=”reaction time”,  ylim=c(0, 35), cex.axis=1.8,  cex.lab=2.0,  cex.main=2.2)

*** Note that we included the “ylim” parameter so that we can easily visualize how the differences among the groups compare with differences from “0”.

##### Reaction time example — with age group: one-way ANOVA commands  Sec 5.2  subsection on Computation for One-Way Model

times1 <- c(17, 23, 14, 20)
times2 <- c(19, 26, 18, 21)
times3 <- c(28, 22, 31, 26)
reac_times <- c(times1, times2, times3)
agegrp_ind <- rep(c(1, 2, 3), each=4)
out_reac_time <- lm(reac_times ~ factor(agegrp_ind))
anova(out_reac_time)

***The reac_times variable could be created directly by the command: reac_times <- c(17, 23, … , 31, 26).  The “factor” within the “lm” command is needed to make explicit that age group is a (categorical) variable for an ANOVA-type model.  Also, the output produced by R does not provide a “Total” line for the ANOVA table we show in the primer.  The primer line was added “by hand” using straightforward addition.

##### Computing a p-value from an F-value:  using the pf command    Sec 5.2  subsection on Computation for One-Way Model

p_val <- 1 – pf(5.12,2,9)
p_val

##### Reaction time example — with age group and environment:  interaction plot   Fig 5.4.1

lwd = 2 # “lwd” in plots refers to “line width”
reaction_time <- c(17, 23, 14, 20, 19, 26, 18, 21, 28, 22, 31, 26, 17, 13, 22, 19, 22, 27, 21, 17, 27, 33, 24, 30
agegroup <- factor(x=rep(c(1,2,3,1,2,3), each=4), labels=c(“age 18-25”, “age 26-33”, “age 34-41”))
region <- factor(x=rep(c(1,2), each=12), labels=c(“urban”, “rural”))
interaction.plot(x.factor=agegroup, trace.factor=region, response=reaction_time,
ylab=”reaction time”, cex.axis=1.8, cex.lab=2.0, cex.main=2.2,
legend=FALSE, ylim=c(14, 32), lty=c(“solid”, “dashed”),
lwd=lwd, col=c(“blue”, “green”), pch=c(0, 1))
legend(x=”topleft”, legend=c(“urban”, “rural”), lty=c(“solid”, “dashed”),
lwd=lwd, col=c(“blue”, “green”), cex=1.7)
means <- tapply(X=reaction_time, INDEX=list(agegroup, region), FUN=mean)
means.urban <- means[ , “urban”]
means.rural <- means[ , “rural”]
sds <- tapply(X=reaction_time, INDEX=list(agegroup, region), FUN=sd)
lengths <- tapply(X=reaction_time, INDEX=list(agegroup, region), FUN=length)
standard.errors <- sds / sqrt(lengths)
epsilon=.005
points(x=c(1, 2, 3), y=means.urban, pch=1, col=”blue”)
points(x=c(1, 2, 3)+epsilon, y=means.rural, pch=1, col=”green”)
arrows(x0=c(c(1, 2, 3), c(1, 2, 3)+epsilon), y0=means – standard.errors,
x1=c(c(1, 2, 3), c(1, 2, 3)+epsilon), y1=means + standard.errors,
code=3, angle=90, length=.1, lwd=lwd,
col=c(“blue”, “blue”, “blue”, “green”, “green”, “green”))

***  The seemingly complicated commands beginning with “legend(x=”topleft …” are used in order to provide appropriate labels to the legend and to locate the legend box properly.  The commands beginning with “means” are used for the error bars.

##### Reaction time example — with age group and environment:  two-way ANOVA command   Table 5.4.3

reaction_time <- c(17, 23, 14, 20, 19, 26, 18, 21, 28, 22, 31, 26, 17, 13, 22, 19, 22, 27, 21, 17, 27, 33, 24, 30)
agegroup <- rep(c(1,2,3,1,2,3), each=4)
region <- rep(c(1,2), each=12)
Age <- factor(agegroup)
Environment <- factor(region)
out <- lm(reaction_time ~ Age + Environment + Age*Environment)
anova(out)

***These commands assume the data entry from the immediately previous interaction plot example.  Also, here we created the “factor variables” for age group and environment before the “out” command.

##### Beta galactosidase activity example:  Plot with raw data   Fig 5.6.1

activity_level <- c(1600, 1800, 2100, 1900, 2000, 210, 270, 140, 160, 250, 3100, 3500, 3700, 3600, 3300, 140,240, 260, 130, 240)
alpha <- rep(c(1, 2, 1, 2), each=5)
temp <- rep(c(22, 37), each=10)
temp <- jitter(temp, factor=.14)
flask <- rep(c(1, 2, 3, 4, 5), 4)
cex.lab=2.0, cex.axis=2.0,
xlim=c(20,40), ylim=c(0, 4000 ))
legend(x=”top”, legend=c(“+alpha”, “-alpha”), lty=”solid”, col=c(1, 2), cex=2.0)

*** The jitter command introduces a bit of separation along the x-axis to prevent points from overlapping each other.  The “pch” command directs R to provide a different plotting character for each different flask (rep).

##### Beta galactosidase activity example:  Interaction plot with square root transformed data   Fig 5.6.2

lwd = 2 # “lwd” in plots refers to “line width”
activity_level <- c(1600, 1800, 2100, 1900, 2000, 210, 270, 140, 160, 250, 3100, 3500, 3700, 3600, 3300, 140, 240, 260, 130, 240)
sq_rt_act_lev <- sqrt(activity_level)
temp <- factor(x=rep(c(1,2), each=10), labels=c(“22ºC”, “37ºC”))
alpha <- factor(x=rep(c(1,2,1,2), each=5), labels=c(“+alpha”, “-alpha”))
interaction.plot(x.factor=temp, trace.factor=alpha, response=sq_rt_act_lev,
ylab=”square root activity level”, xlab=”temperature”,
cex.axis=1.8, cex.lab=2.0, cex.main=2.2, lwd=lwd,
legend=FALSE, lty=c(“solid”, “dashed”),
col=c(“blue”,”green”))
legend(x=”right”, legend=c(“+alpha”, “-alpha”), lty=c(“solid”, “dashed”),
lwd=lwd, col=c(“blue”,”green”), cex=1.7)
means <- tapply(X=sq_rt_act_lev, INDEX=list(temp, alpha), FUN=mean)
means.palpha <- means[ , “+alpha”]
means.malpha <- means[ , “-alpha”]
points(x=c(1, 2), y=means.palpha, pch=16, col=”blue”, cex=0.6)
points(x=c(1, 2), y=means.malpha, pch=16, col=”green”, cex=0.6)
sds <- tapply(X=sq_rt_act_lev, INDEX=list(temp, alpha), FUN=sd)
lengths <- tapply(X=sq_rt_act_lev, INDEX=list(temp, alpha), FUN=length)
standard.errors <- sds / sqrt(lengths)
arrows(x0=c(1, 2, 1, 2), y0=means – standard.errors,
x1=c(1, 2, 1, 2), y1=means + standard.errors,
code=3, angle=90, length=.1, lwd=lwd,
col=c(“blue”, “blue”, “green”, “green”))

***  Again, the seemingly complicated commands beginning with “legend(x=”topleft …” are used in order to provide appropriate labels to the legend and to locate the legend box properly.  The commands beginning with “sds” are used for the error bars.

##### Beta galactosidase activity example:  R commands for ANOVA table   Table 5.6.5

activity_level <- c(1600, 1800, 2100, 1900, 2000, 210, 270, 140, 160, 250, 3100, 3500, 3700, 3600, 3300,140, 240, 260, 130, 240)
sq_rt_act_lev <- sqrt(activity_level)
alpha <- rep(c(1, 2, 1, 2), each=5)
temp <- rep(c(1, 2), each=10)
flask <- rep(c(1, 2, 3, 4, 5), 4)
Alpha_factor <- factor(alpha)
Temperature <- factor(temp)
out <- lm(sq_rt_act_lev ~ Reps+Alpha_factor+Temperature+Alpha_factor*Temperature)
anova(out)

# R commands for Chapter 6

Goodness-of-Fit Test for dihybrid cross example:  R commands for test   Sec 6.2

observed_counts <- c(218 ,80, 72, 30)
probabilities <- c(9/16, 3/16, 3/16, 1/16)
chisq.test(x=observed_counts,  y=NULL, correct=FALSE, p=probabilities)

*** Since this is effectively a one-sample test, as opposed to the example in Sec 4.5 which has two samples , we make this explicit by the “y=NULL” statement.  Also, you would obtain the same answer with “correct=TRUE” for this example.  (In general, for examples where the results differ, we recommend “correct=FALSE”.)

Cell shape example:  R commands for stacked bar plot   Fig 6.3.1

number_cells  <-  cbind(c(163, 234, 3), c(76, 12, 312), c(321, 76, 3), c(296, 59, 45))
barplot(height=number_cells, width=1, names.arg=c(“-alpha/30°C”, “+alpha/30°C”, “-alpha/4°C”, “+alpha/4°C”),
legend.text=c(“unbudded”, “budding”, “shmooing”),
xlim=c(.2,4.5), ylab=(“number of cells”),
args.legend=locator(1), col=c(“red”,”yellow”,”blue”))

*** The locator(1) command allows the user to place the legend at a position chosen with the cursor.

Cell shape example:  R commands for chi-square test   Sec 6.3

celldat <- c(163,76,321,296,234,12,76,59,3,312,3,45)
cellmat <- matrix(data=celldat, nrow=4, ncol=3)
chisq.test(cellmat, correct=FALSE)

*** If you wish to output a table of the observed and/or expected values, use the following commands:

out <- chisq.test(cellmat, correct=FALSE)
out$observed out$expected

# R commands for Chapter 7

##### Samara Example:  Scatterplot of terminal velocity versus square root of disk loading  Fig 7.2.1

sqrt_disk_load <- c(1.97, 2.12, 1.95, 2.38, 2.21, 2.04, 2.29, 2.07)
term_vel <- c(0.93, 1.06, 1.04, 1.13, 1.07, 1.11, 1.17, 0.98)
ylab=”terminal velocity (in cm/sec)”,
cex.axis=1.3, cex.lab=1.1, pch=19)

***The “pch” command “filled in” the points on the graph.

##### Samara Example:  R regression commands  Table 7.2.1

summary(out)

***The “lm” command is the the same one that we used for the ANOVA examples.  However, here we did not use the “factor” command with sqrt_disk_load.  The “factor” command  distinguishes the use of “lm” between ANOVA type analyses and regression analyses.

# Data Appendix

Given below is a copy of the .csv file for Number of Thistles for Data Set A:

burned unburned
29 24
17 30
33 13
25 17
6 3
12 11
20 37
36 6
39 8
10 22
4 16

This chapter is adapted from Appendix 1: R Commands in Process of Science Companion: Data Analysis, Statistics and Experimental Design by Michelle Harris, Rick Nordheim, and Janet Batzli. 