Why are statistics important for biologists (and all scientists)?
Biologists’ questions about the natural world are stimulated from careful observations, an awareness and understanding of previous discoveries of other scientists, and (perhaps most importantly) their own innate curiosity. Biologists fill in existing knowledge gaps by formulating hypotheses or predictions, making systematic observations and/or carrying out manipulative experiments, collecting data, and making logical conclusions using their own data as evidence.
The discipline of statistics provides biologists with a powerful and necessary tool for making informed formal scientific conclusions based on the analysis of variable data. The conceptual underpinnings of statistics can be thought of as providing a measure of confidence in answering the following question: “How can we reach informed conclusions about the real world from data that are subject to variability?” The language and methods of statistics provide scientists and researchers with a standard and well-validated approach to help address this question. The careful and proper use of statistics allows scientists to describe their data appropriately, validate conclusions, and avoid making unsubstantiated claims.
Why is a Statistics Primer important for undergraduate biology students?
This contents of this Statistics Primer are based on over five decades of supporting University of Wisconsin-Madison Biology Core Curriculum (Biocore) students as they make sense of data and decide whether it can/should be used to inform defensible conclusions about their hypotheses and questions. In our integrative, 3-semester lab sequence in the honors undergraduate Biocore Program, students experience the scientific process first-hand through several authentic investigations of their own novel research questions. As Biocore students progress through 2-3 semesters of lab, they are introduced to formal statistical analyses through a scaffolded curriculum that begins with an emphasis on recognizing the variability inherent in biological data sets and culminates with formal hypothesis testing for different types of experimental designs used in biology research. (For more about the Biocore lab sequence, see Batzli, 2005; Remsberg et al., 2014; Batzli et al., 2018; Harris et al. 2018; Batzli et al., in review; Biocore.wisc.edu). As students design and carry out their own investigation over multiple semesters, their understanding of the relationship between statistics and science grows, and their experimental designs improve accordingly. However, just like with biology, statistics requires careful and nuanced thinking. It cannot be viewed just as a set of rules for handling data. The strongest conclusions are made when biological researchers meld biological and statistical reasoning.
In this Primer, we focus on a few of the key tools that will be useful for most undergraduate students’ investigations of novel research questions. Although this Primer was prepared with the needs of UW-Madison Biocore students in mind, we believe that it can be useful in other data rich biology lab courses and independent research experiences that focus on quantitative reasoning, experimental design, data analysis, and hypothesis testing statistics.
This Primer is not intended to replace a standard, comprehensive statistics textbook. Rather, it should be viewed as a presentation of the key ideas of statistics that are important for undergraduates experiencing authentic process of science. Students who intend to continue in biology will almost certainly wish to obtain more training in statistics. However, even though the treatment here is not comprehensive, the Primer is not intended as a collection of “cook-book” instructions. Indeed, the presentation emphasizes the development of statistical thinking and reasoning rather than a step by step guide. Substantial emphasis will be placed on the assumptions that underlie the methods presented. Also, to the extent possible, we provide suggestions for careful evaluation of sample results and avoidance of common pitfalls.
How should this Primer be used?
Although there might be a tendency to view statistical methods as a set of tools that might be found in a toolbox in the garage (with wrenches, hammers, screwdrivers, etc.), appropriate usage of statistics is part of a logical conceptual development. Thus, it is important to gain some appreciation for statistics that transcends a toolbox point of view. In particular, in the Primer we will emphasize a “process of science” perspective that includes statistical thinking as a core component.
It is likely that most readers will approach this Primer as “a la carte” users, using relevant sections that are key reading for particular experimental designs. This means that when readers need to learn more about some type of statistical procedure, they will likely jump to the relevant section(s) in the Primer for that procedure. It is important to realize, however, that statistical ideas build on one another. Thus, each Primer section will — at least in part — refer to material that has been described earlier in the Primer. To the extent possible, we have indicated which previous sections provide necessary “building material” by use of formal links.
In many cases we will provide equations for relatively simple calculations. We will also present commands using the statistical program R to perform many of these computations (R Core Team 2020). We include a fairly extensive appendix providing an introduction to R and, also, the commands for many of the tables and plots included in the Primer. We also include an appendix of RStudio tutorials (RStudio Team 2020) developed for Biocore students new to R. However, for the relatively simple calculations, we strongly encourage working through the equations “by hand” — at least for the first several times that an equation is used. This allows an appreciation for how certain concepts relate that cannot be obtained from immediately using computer commands. Finally, we have embedded brief videos which provide elaboration on topics that transcend any one statistical test, such as the meaning of p-values and common issues in experimental design.
Batzli, Janet M. 2005. A Unique Approach? Four Semesters of Biology Core Curriculum. CBE Life Sciences Education 4: 123–137.
Batzli, Janet M., Harris, M.A., McGee, S.A. 2018. It Takes Time: Learning process of science through an integrative, multi-semester lab curriculum. Tested Studies for Laboratory Teaching. Proceedings of the Association for Biology Laboratory Education 39(21).
Batzli, J.M., M.A. Harris, D. Lee and H.A. Horn (in review). Feedback and discourse as critical skills for the development of experimentation competencies. In N.J. Pelaez, S.M. Gardner, and T.R. Anderson editors. Trends in Teaching Experimentation in the Life Sciences: Putting research into practice to drive institutional change. Springer Nature Switzerland.
Harris, Michelle A., McGee, S. A., Batzli, J.M. 2018. Uncooking Yeast: Cells signaling a rise to inquiry. Tested Studies for Laboratory Teaching. Proceedings of the Association for Biology Laboratory Education.
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/
RStudio Team (2020). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/