R for Data Science

R4DS logo

Content

The course R for Data Science is a continuation of our introductory course From Excel to R and it is therefore targeted towards people who already have some experience with R. 

The course starts by reinforcing the core R programming structure, including RStudio, R scripts, R projects, and Quarto documents. The focus then shifts to advanced data wrangling using the tidyverse to prepare data for analysis and exploratory data analysis (EDA) with ggplot2 to visualize patterns. Participants will also learn to perform PCA with ggfortify, automate tasks with R scripting (loops, conditionals, functions), and build basic models such as linear regression, logistic regression, and clustering. The course concludes with a full project, guiding participants from data preparation and exploration to modeling, while ensuring the work is reproducible and well-documented. 

By the end of the course, students will have the skills to manipulate, analyze, and visualize data in R, perform common models, and document their work effectively.

Learning Outcome

A student who has met the objectives of the course will be able to: 

  • Explain and use the fundamental structure around R programming: R, R script, Quarto document, R project, R studio.   
  • Perform advanced data wrangling/data management tasks using the tidyverse packages to organize and prepare data for analysis. 
  • Perform exploratory data analysis with the ggplot2 package to spot patterns and trends in your data. 
  • Perform and visualize principal component analysis (PCA) with the ggfortify packages to reveal the structure of the data.  
  • Write and use R scripts by applying loops, conditionals, and functions to automate tasks and make your analysis more efficient. 
  • Build and interpret models in R to analyze data, including linear regression, logistic regression, and clustering techniques. 
  • Complete a full project in R from start to finish, including preparing your data, exploring it, running PCA, modeling, and presenting your results, while ensuring your analysis is reproducible and well-documented.