Session 0: Introduction

Content

Philosophy

I have never taken a course or workshop in using R. I’ve read a lot of books on how to program with R. To be honest, I’m not sure how much they helped. I learned R by taking a single script that I wrote to create a scatter plot and modifying it or “hacking it” to get it to do what I wanted. If I ran into a problem, I would either google the error message or the question I was trying to answer. As I asked around, I learned that most people learned R by hacking their way to success along with a lot of practice. That is the underlying philosophy of this series of lessons. Most programming books slowly build to something useful with silly examples along the way. The first code you will write in Lesson 1 will be the basis of every other piece of code we write in these tutorials. We will start with working code for a plot that could be published and hack it until we have a plot showing which taxa are associated with health or disease.

I suspect that you will understand the first chunk of code we write. We will strive for readable code that is easy to understand. That being said, just because you suspect that the geom_point function will add points to a plot, doesn’t mean that you know how to use geom_point or that you would know how to make a bar chart. Calmly accept your ignorance and know that all will be explained eventually. Learning experts have found that we do not learn best by taking on a topic and beating it to death until we’ve mastered it. Rather, we learn best when we learn something partially, move on to something else that we also learn partially, but can fold in the previous knowledge to help us improve our partial knowledge of the earlier topic. It’s kind of like taking steps forward in a dark room only to get to the end and see that you knew the path all the way along. This is the approach that we will be taking with these lessons. My goal is not to provide a reference on R or to necessarily document every nook and cranny of the language and its myriad packages. I will empower you to do that.

The final philosophical point I will make is that I believe it is important to eat your own dog food as an educator. Everything I teach, is how I want to code and how I want those that work for me to code. There is definitely always room for improvement, but be confident that I’m not trying to sell you on something that I do not use myself. That being said, although I don’t claim that the plots we’ll make are works of aRt, I do think that they’re pretty close to being publication quality. Why make a crappy plot, when you could make a good one that puts your work in the best possible light?

If you notice a bug, something that is unclear, have an idea for a better approach, or want to see something added, please file an issue or, even better, a pull request at the project’s GitHub repository.

Why R

If you’re looking for some big “pound your chest” explanation for why you should learn R, then you’re looking in the wrong place. I know R. That’s why I teach R. Why did I learn R? There were people around me that new R and I knew I could depend on them to help me learn R if I ran into any problems. Less important than which language you should learn is that you learn A language. Any language, really.

The way I see it there are several credible languages if you are a scientist: R, Python, C/C++, Java. R and Python are “high level” languages that have a lot of built in goodies to make your life easy. As you’ll see, it’s pretty easy to build a graph or to calculate a mean in R (and python). These languages are engineered to make it easier on the programmer than the person running the code. In contrast, C/C++ and Java are not as easy to program, but are far more efficient and run blazing fast. You’ll hear about others like Julia, Ruby, or Perl. These aren’t quite mainstream for biologists or aren’t fully developed yet or are past their sell by date. Unless you have needs for high performance, I’d probably stay away from C/C++ and Java isn’t really all that high performance. If you need the speed of C++ you can write C++ in R.

This leaves you to chose between R and Python. You can google “Should I learn R or Python” and you’ll get screed after screed telling you why one language is the best. Do not read these. They’re next to worthless and smack of all sorts of machismo. I block accounts on Twitter that go off on R vs. Python screeds. I know R’s warts and I know that Python could possibly cure these warts. But I also know that Python has its own warts. Rather than carry the cognitive baggage of learning both, I do what I need in R. At least a few times a year I tell myself I should learn Python to know it, but when it comes to doing it, I’m just not sold. To be honest, to really appreciate the differences between the languages you probably need a fair bit more experience than someone that is reading this. Note that someone else could/should easily rewrite this paragraph switching R and Python.

But really! What should you learn? Depends. What does your research group use? What do your collaborators use? What do the people around you use? If you have a problem, who are you going to get help from? For me, the answers to these questions were generally: R. Again, it’s more important that you learn your first language than which language you learn. Master your first language and then start noodling with others. I always cringe when I see someone encouraging a novice to learn other languages. It can only sow confusion and frustration. Since you’re here, I suspect someone has encouraged you to learn R or that your local community has some R chops. Welcome! I want to challenge you to not just use your community to help you, but to also nourish your community to help it grow.

What you need to do these tutorials…

Set up our minimalR project…

Customizing RStudio

Oversized calculator

On the left side there is a tab for console. This is where we will be entering most of our commands. Go ahead and type 2+2 at the > prompt

2+2
## [1] 4

Now type the following at the prompt (feel free to use your own name)

my_name <- "Pat Schloss"

Now look in the upper right panel. In the “Environment” tab you’ll see that there’s a new variable - my_name and the value you just assigned it. We’ll talk more about variables later, but for now, know that you can see the variables you’ve defined in this pane.

Go ahead and click on the “History” tab. There you’ll see the last two commands we’ve entered.

Working through tutorials

As you go through the tutorials you should be saving your code in a text file. Note that a Microsoft Word docx file is not a text file! We want a simple file that only contains text, no formatting. Go “File->New File->Rscript”. This will open a file called “Untitled1” in the upper left panel and it will push the “Console” panel down along the left side.

Save “Untitled1” as lesson_00.R in your minimalR directory with the Rproj file. You should now see lesson_00.R listed in the “Files” tab in the lower right corner. Go ahead and enter 2+2 in lesson_00.R.

One of the nice features of RStudio is that you can put your cursor on the line or highlight the lines you want to run in lesson_00.R and then press the “Run” button and it will copy, paste, and run the line(s) in the “Console” window.

Alternatively, you can check the “Source on Save” button and every time you save the file, it will run the code in that file. Keep in mind that it will run every command so if you have some non-R code in the file, it will likely gag and complain. I would suggest you create a separate lesson_XX.R file for each lesson that we do as we work through the lessons.

Installing packages

We will use several R packages throughout the lessons. The first that we’ll use is called tidyverse. We’ll be talking a lot about this package as we go along. But for now, we need to install this package. In the lower right panel of RStudio, select the “Package” tab. You’ll get something that looks like this:

In the search window, type in “tidyverse” (without the quotes). If it isn’t already installed, you won’t see it. If it is installed, it will be listed. The package isn’t installed on my computer.

If it isn’t installed on your computer either, go ahead and click the Install button and type “tidyverse” into the “Packages” window:

Once you press the “Install” button, the dialog will close and RStudio will install the package. You’ll notice a couple things have happened. In the Packages tab in the lower right panel, you now see the “tidyverse” package is there. You’ll also notice that in the lower left corner that R ran the command install.packages("tidyverse").

Finally, to make all of the tidyverse goodness available as we go through the tutorials, you can either click the small square next to “tidyverse” in the “Packages” tab or you can run library(tidyverse) in the console tab in the lower left panel of RStudio.

My setup

If you run sessionInfo at the console, you will see the version of R and the packages you have installed and attached (more about what this all means later). Here’s what mine looks like.

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3     purrr_0.3.3    
##  [5] readr_1.3.1     tidyr_1.0.0     tibble_2.1.3    ggplot2_3.2.1  
##  [9] tidyverse_1.2.1 knitr_1.26      ezknitr_0.6    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3        cellranger_1.1.0  pillar_1.4.2      compiler_3.6.1   
##  [5] R.methodsS3_1.7.1 R.utils_2.9.0     tools_3.6.1       zeallot_0.1.0    
##  [9] lubridate_1.7.4   jsonlite_1.6      evaluate_0.14     lifecycle_0.1.0  
## [13] nlme_3.1-141      gtable_0.3.0      lattice_0.20-38   pkgconfig_2.0.3  
## [17] rlang_0.4.1       cli_1.1.0         rstudioapi_0.10   haven_2.1.1      
## [21] xfun_0.11         withr_2.1.2       xml2_1.2.2        httr_1.4.1       
## [25] hms_0.5.2         generics_0.0.2    vctrs_0.2.0       grid_3.6.1       
## [29] tidyselect_0.2.5  glue_1.3.1        R6_2.4.1          readxl_1.3.1     
## [33] modelr_0.1.5      magrittr_1.5      backports_1.1.5   scales_1.0.0     
## [37] rvest_0.3.4       assertthat_0.2.1  colorspace_1.4-1  stringi_1.4.3    
## [41] lazyeval_0.2.2    munsell_0.5.0     broom_0.5.2       crayon_1.3.4     
## [45] R.oo_1.22.0