R – The Future of Psychology Statistics is Open Source

Posted June 21, 2018

By Robert Franklin

Four years ago, I threw out everything I knew and changed how I taught statistics. I started teaching statistics using R, a free and open-source program used for statistical analysis. Previously, my students used other programs with point-and-click menus, which resembled spreadsheets where they entered data and selected their analyses from menus. But now I took the plunge into the world of teaching statistics using a program with typed commands, where there were very few textbook resources for help.

The logo of the open source statistics software "R".

Why did I take the plunge into teaching R? And why should you? The reason is simple: R is the future. I’m going to try to use this blog entry as a way to explain why we should teach R to undergrad psychology statistics students, and advocate for the development of free and open resources to help students learn, using R.

The field of statistics, especially how it applies to psychology, has reinvented itself in the last decade. Data science and data analytics are growing career fields as businesses and other organizations are increasingly using data in order to inform decisions. Interactive, informative, and beautiful data visualizations have now created a sub-field of journalism, as popularized by examples like Vox , FiveThirtyEight and The Upshot. These fields, which combine the data analytics of statistics and the theory of research methods, represent new opportunities for our students. Psychologists have been training students in statistics and research methods since long before “Data Science” became trendy, but our classes need to change in order to stay at the forefront of these changes.

A student-created statistics visualization representing concepts like probability distribution and regression analysis.
A 2017 Kantar Information is Beautiful Award winner created by a team of undergraduates at Brown University.[Image: Daniel Kunin, Tyler Dae Devlin, Jingru Guo, and Daniel Xiang, https://goo.gl/i4PFF5]

At the same time, a series of crises has changed how psychology research uses statistics. The basic theories taught in stats classes a generation ago, including null hypothesis testing, are increasingly under attack. In 2016, the American Statistical Association issued a statement designed to “steer research into a ‘post p< .05 era’”. In addition, the inability to reproduce basic research findings in psychology and biostatistics has led many, such as the Center for Open Science , to call for researchers to openly share data and findings in a way that allows for observers to reproduce statistical analyses.

These issues require many answers, and embracing R is a first step in revolutionizing psychology statistics. There are many advantages of using R. First, R is open source, so that it is free for anyone to download, use, and modify. Students do not need to visit computer labs or buy student versions of other software, which are expensive and often limited in what they can do. Because R is open-source and free to modify, over 10,000 free packages, or add-ons, exist to do any possible statistical technique, from complex visualizations to models. Almost any new statistical technique developed comes with a new package in R. This flexibility and power is why graduate programs in psychology are increasingly teaching in R.

Nonetheless, R is rarely taught in undergraduate psychology statistics courses. As part of a poster I presented in NITOP in January 2018, I found that of the sampling of fifty statistics syllabi I collected, only two undergrad courses used R. At the same time, the use of R has grown exponentially in psychology research, as indicated by citations in journal articles.

So why isn’t R popular in the undergrad classroom? R is a statistical programming language, using typed commands to enter data and do statistical tests. The downside is that this makes it hard for students to see the data and often leads to more errors from misspelled commands. Though there are add-ins to R that allow for point-and-click menus, such as R Commander, the command line approach is very powerful because it allows for students to learn good data practices incorporating reproducible data analysis from the very beginning of their training.

Another problem for adopting R is the lack of educational resources for teaching R in psychology statistics. Even though R is becoming more popular, very few of the most popular textbooks for teaching basic psychology statistics incorporate materials using R. Most of the educational resources for R are geared for students with a more significant math background, making it hard for students to apply the material. The lack of resources to teach using R, along with the inherent difficulty of teaching a command-line based approach, probably explain why R is not popular in the classroom.

There are a few good resources for learning R, such as Field’s Discovering Statistics Using R. In addition, many resources were featured in the May 2017 edition of the APS Observer. However, many of these resources are geared for advanced students, which underscores the necessity of developing undergraduate instructional materials for teaching R to psychology stats students. As part of a grant from the Center of Innovation in Digital Learning at Anderson University, I developed a freely-available open source textbook using R, which is available in a beta form on iBooks.

Cover of Dr. Franklin's statistics book titled, Statistics: The Story of Numbers

The final reason for teaching R is anecdotal. Students feel accomplished conducting analyses in R. The initial learning curve becomes a sense of accomplishment when students learn how to do their own command-line based analyses, edit their own scripts, and then apply this knowledge. Several of our students have commented how helpful this is in graduate work, where their knowledge of R has let them jump right into graduate research projects.

The use of open-source software in teaching statistics is both the future of psychology and in the spirit of the Noba Project and the broader open-education movement that aim to remove barriers to college. Though R is free, and as such reduces the financial burden on students, we need additional accessible and freely available materials to use R in the classroom. I encourage other teachers of psychology stats and research methods to contribute to this effort by looking into using R in the classroom, and creating and sharing open-source resources to teach R. I am open to any collaboration or initiative to help develop these resources, so please contact me if you have any feedback on the textbook or would like to collaborate on developing more resources.

Bio

Dr. Robert Franklin is an Assistant Professor of Psychology at Anderson University, in Anderson SC, where he teaches courses in neuroscience, statistics, and research methods. His research interests involve understanding how people read social information from faces and how aging affects these processes. His teaching interests involve student collaborations with research and spreading the good news about R. Robert is also co-author of the Noba learning module Attraction and Beauty. You can find out more at his website: rfranklin.netlify.com