Introduction to R
Site: | Saylor Academy |
Course: | BUS250: Introduction to Business Intelligence and Analytics |
Book: | Introduction to R |
Printed by: | Guest user |
Date: | Tuesday, July 1, 2025, 1:05 PM |
Description
Now, we will introduce the R programming language. You can also follow the links and download a copy of R to follow along and complete the examples. R is a powerful and versatile programming language primarily used for statistical computing, data analysis, and graphical visualization. It is widely used in the creation of models in BI applications. R includes a comprehensive set of tools and libraries for handling, manipulating, and analyzing data sets of various sizes and complexities. It also has extensive packages covering areas such as machine learning, time series analysis, and data visualization. These features, combined with a relatively easy-to-use interface that allows non-programmers to rapidly get up to speed, make R a popular choice for developing models in BI systems.
Introduction
Welcome to this introduction to R and R Studio for absolute beginners. This book is designed first and foremost for College of DuPage students in Sociology 1205 (Introduction to Data Science) and Sociology 2200 (Introduction to Research Methods). But it can be just as easily used by anyone looking to learn the basics of the R language in the R Studio environment. In addition, this book is written specifically for people with absolutely no experience in coding or data science. In other words, this book is the absolute starting point.
For those College of DuPage students working with this book as part of Sociology 1205 or 2200, you will encounter interactive modules to check your knowledge and understanding as we go. In such a case, it will look like this:
I should note, though, that even though this book is designed for absolute beginners, it does not mean it's easy. Learning to code is similar to learning a new language and there is a learning curve. Like any language, R, the computational language we are going to learn here, has its own vocabulary and syntax. Moreover, you are going to learn R in the context of learning about data science and/or research methods. This means that the level of learning is doubled: this is like learning to speak and write in a new language to talk about a field in which you are completely new. So there is a double learning curve.
Because of this double learning curve, I highly recommend not trying to do all the work, especially not on the day of the deadline. On the contrary, you must practice a little every day and, for those of you taking the credit classes mentioned above, not leave it all to the day of the deadline.
The goal of this book is to get you started and ready for the rest of the classes. I recommend completing at least one chapter a day before getting to the assignment.
The best way to use this book is to follow along with the code, as opposed to passively watching the videos. As you watch the videos, open a separate browser tab, open a new script in R Studio Cloud, and type up the code from the videos. This will get you used to writing code when we get to the more sophisticated work. Also take notes of the things you want to remember.
Good luck as you start your journey into the world of data science and research methods.
Source: Christine Monnier, https://cod.pressbooks.pub/introduction2r/front-matter/introduction/ This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
Why R and R Studio?
What they are and what they do and why we need them
Why Code?
Why am I asking you to code in R in Sociology 1205 and 2200? Is there no software that can do all this stuff with just point-and-click interfaces without all the bother? Yes, there is BUT . . . there is value to coding and it is worth learning.
First, R and R Studio are very widely used in the fields of statistics and data science. There are good reasons for this: as mentioned below, R is open source. No one owns it. It is free to use. The main statistical software are very expensive. Also, the R language is very efficient and, dare I say, elegant in the cleaning, exploring, plotting, and modeling of data.
Lastly, I think one of the main benefits of coding is that it forces us to think about what, exactly, we are doing, rather than just, more or less mindlessly, clicking on buttons to get an output. And since both Sociology 1205 and 2200 are introductory classes, it is even more important for us to engage with that type of thoughtful examination of what we are doing and what steps we are following as we approach our data. We need to think about what we are trying to do and how we are going to do it. To develop this habit of the mind is extremely important in introductory classes, so we can then bring those skills to more advanced classes, if you choose to pursue work in the fields of statistics or data science.
R, R Studio, and R Studio Cloud
First, a basic distinction:
R is the computational language that is built for statistics and data science. It is a relatively intuitive language but it still requires consistent practice to get started. R is also open-source so we do not have to buy additional pieces of software to run our data work. R is free, as well as maintained and developed by a very active community of developers.
R Studio is the environment in which we will create and run our R code to explore, plot, and model data. R Studio expands the basic options available in R and streamlines a lot of the operations we need to complete in our work, such as loading up code scripts, loading up datasets, or saving our work in a shareable format. R Studio makes all that much easier.
Both R and R Studio can be installed on your computer. For R, you would go to this link and click on the download link. For R Studio, you would go to this link and choose the free R Studio desktop option that matches your operating system.
However, if you are taking the Sociology 1205 or 2200 at College of DuPage, we will work on in the R Studio / posit Cloud workspace specific to our class. You will find the link in your course in Blackboard.
So you do not have to install R and R Studio on your own computer. R Studio Cloud works in your browser. Just sign up for a free account using your dupage.edu email address and log in with the link posted in Blackboard. You will find your own personal workspace as well as the course workspace.
Note: since this book was written and the tutorials recorded, R Studio was renamed posit. The cloud environment we use is now named posit.cloud. However, all the links will work just fine.
The R Studio Cloud Interface
This chapter is really your first steps into R Studio.
Learning Objectives
In this chapter, we will explore:
- the top menu bar;
- the console panel;
- the source or document panel;
- the files + panel;
- the environment panel.
The top menu bar
This set of menus looks a lot like what you might find in pretty much any software you use, be it Microsoft Word or Google Docs. In the video, we will focus on one nice way to customize your interface in a way that best fits you by choosing a theme, that is, a way to change the way your R Studio Cloud environment looks. For instance, if you need a larger font or greater contrast to accommodate your vision, I will show you how to do that in the video.
The console panel
The console panel is where our code shows up once we run it. The console is where our code output is rendered.
The source panel
The source panel is a panel that is closed by default but opens when you open a script file where code has been stored and saved. In the video, I show you different way of opening that panel and loading a basic script. It is called the source panel because this is where code is sourced from. The output of the source code shows up in the console panel.
The files+ panel
I am calling it files+ because it is not just a files menu. It also includes additional tabs such as plots, packages, help, viewer, and presentation. The video explores the main tabs we will use with their specific options.
The environment panel
The environment panel is where the datasets and objects we create or load are stored and available for viewing.
Keep in mind that this is just an overview. I will have more specific videos in the upcoming chapters on some key aspects of the interface we will use, such as installing and opening packages, uploading or opening datasets, or exporting plots.
For now, please watch the video below:
And if you are taking Sociology 1205 or 2200, please complete the quiz below.
Objects, Vectors, and Functions
Now that we are familiar with our interface, it is time to get started with the actual R language. As we know from the previous chapter, R is a sophisticated calculator, specifically appropriate for statistics and data science.
I highly recommend following along with the videos below in your own R Studio Cloud Workspace.
Learning Objectives
In this chapter, we will cover the following topics:
- objects;
- vectors;
- functions.
With objects, vectors and functions, we are really getting into the R language so this is where things get serious.
For those of you taking a class, you will be provided an empty script with only the instructions. You should type the tutorial code as you follow along. Writing code yourself, as opposed to just watching the videos or just running code provided by someone else, is the best way to learn and get more comfortable.
Part 1 – Introducing objects, vectors, and functions
The first video in this chapter walks you through the basics of objects, vectors, and functions.
Objects are the basic building blocks of the R language. They are what we create, manipulate, model, visualize in order to extract information from them. Vectors are a type of object in R. They are one-dimensional series of numbers, or characters, or logical expressions. In the video below, we'll create our first objects and vectors and conduct some basic manipulation using functions.
Functions are another key building block of the R language. Think of functions as verbs that we apply to objects. Functions do something to objects. This first video also introduces you to some basic functions.
Part 2 – Using functions to automatically create objects
In this section, we will expand our understanding of objects, vectors, and functions by generating vectors using three functions that automate that process based on parameters we set. It's less complicated and abstract than it sounds.
Part 3 – Logical and character vectors
So far in this chapter, we have examined numeric vectors. In this last section, we will examine the other two types of vectors we will encounter: logical and character vectors.
We will also introduce the concepts of vector recycling and coercion.
This concludes your first steps with the R language. Hopefully, you are starting to understand how it works: we create objects, such as vectors, and then, we manipulate or extract information from these objects, using the appropriate functions. This is the logic we will apply throughout the semester, albeit with more complex objects and functions, but the underlying reasoning is the same.
Key functions used in this chapter
Part 1
- c(): the function that creates basic objects and vectors;
- sum(): the function that calculates the sum of all the elements in a numeric object;
- min(): the functions that identifies the smallest value in a numeric object;
- max(): the function that identifies the largest value in a numeric object;
- mean(): the function that calculates the mean of a numeric object;
- median(): the function that calculates the median of a numeric object;
- summary(): the function that provides summary statistics for a numeric object;
- length(): the function that calculates the length (number of elements) of a numeric object;
- help(): the function that opens the help documentation on any function.
Part 2
- seq(): the function that generates a sequence of numbers;
- rep(): the function that generate a repeating series of numbers;
- sample(): the function that randomly generates a series of numbers;
- sort(): the function that sorts a series of number in increasing or decreasing order.
Part 3
- paste(): the function that joins character vectors.
Check your understanding by taking the quiz below.