We'll practice with the data frame format, which is the usual format for storing information on different variables. We'll practice the extensions of this format later. Use the cats
data frame to solve these challenges
cats <- data.frame(coat = c("calico", "black", "tabby"),
weight = c(2.1, 5.0, 3.2),
likes_string = c(1, 0, 1))
Challenge 1
There are several subtly different ways to call variables, observations and elements from data.frames:
cats[1]
cats[[1]]
cats$coat
cats["coat"]
cats[1, 1]
cats[, 1]
cats[1, ]
Try out these examples and explain what is returned by each one.
Hint: Use the function typeof()
to examine what is returned in each case.
Solution
cats[1]
Output |
---|
coat 1 calico 2 black 3 tabby |
We can think of a data frame as a list of vectors. The single brace [1]
returns the first slice of the list, as another list. In this case it is the
first column of the data frame.
cats[[1]]
Output |
---|
[1] calico black tabby Levels: black calico tabby |
The double brace [[1]]
returns the contents of the list item. In this case
it is the contents of the first column, a vector of type factor.
cats$coat
Output |
---|
[1] calico black tabby Levels: black calico tabby |
This example uses the $
character to address items by name. coat is the
first column of the data frame, again a vector of type factor.
cats["coat"]
Output |
---|
coat 1 calico 2 black 3 tabby |
Here we are using a single brace ["coat"]
replacing the index number with
the column name. Like example 1, the returned object is a list.
cats[1, 1]
Output |
---|
[1] calico Levels: black calico tabby |
This example uses a single brace, but this time we provide row and column coordinates. The returned object is the value in row 1, column 1. The object is an integer but because it is part of a vector of type factor, R displays the label “calico” associated with the integer value.
cats[, 1]
Output |
---|
[1] calico black tabby Levels: black calico tabby |
Like the previous example we use single braces and provide row and column coordinates. The row coordinate is not specified, R interprets this missing value as all the elements in this column vector.
cats[1, ]
Output |
---|
coat weight likes_string 1 calico 2.1 1 |
Again we use the single brace with row and column coordinates. The column coordinate is not specified. The return value is a list containing all the values in the first row.
Challenge 2
Create a list of length two containing a character vector for each of the sections in this part of the workshop:
- Data types
- Data structures
Populate each character vector with the names of the data types and data structures we've seen so far.
Solution
dataTypes <- c('double', 'complex', 'integer', 'character', 'logical') dataStructures <- c('data.frame', 'vector', 'factor', 'list', 'matrix') answer <- list(dataTypes, dataStructures)
Note: it's nice to make a list in big writing on the board or taped to the wall listing all of these types and structures - leave it up for the rest of the workshop to remind people of the importance of these basics.
Source: The Carpentries, https://swcarpentry.github.io/r-novice-gapminder/04-data-structures-part1/index.html#vectors-and-type-coercion
This work is licensed under a Creative Commons Attribution 4.0 License.