Factors
Creating factors
Imagine that you have a variable that records month:
x1 <- c("Dec", "Apr", "Jan", "Mar")
Using a string to record this variable has two problems:
-
There are only twelve possible months, and there's nothing saving you from typos:
x2 <- c("Dec", "Apr", "Jam", "Mar")
-
It doesn't sort in a useful way:
sort(x1) #> [1] "Apr" "Dec" "Jan" "Mar"
You can fix both of these problems with a factor. To create a factor you must start by creating a list of the valid levels:
month_levels <- c( "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" )
Now you can create a factor:
y1 <- factor(x1, levels = month_levels) y1 #> [1] Dec Apr Jan Mar #> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec sort(y1) #> [1] Jan Mar Apr Dec #> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
And any values not in the set will be silently converted to NA:
y2 <- factor(x2, levels = month_levels) y2 #> [1] Dec Apr <NA> Mar #> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
If you want a warning, you can use readr::parse_factor()
:
y2 <- parse_factor(x2, levels = month_levels) #> Warning: 1 parsing failure. #> row col expected actual #> 3 -- value in level set Jam
If you omit the levels, they'll be taken from the data in alphabetical order:
factor(x1) #> [1] Dec Apr Jan Mar #> Levels: Apr Dec Jan Mar
Sometimes
you'd prefer that the order of the levels match the order of the first
appearance in the data. You can do that when creating the factor by
setting levels to unique(x)
, or after the fact, with fct_inorder()
:
f1 <- factor(x1, levels = unique(x1)) f1 #> [1] Dec Apr Jan Mar #> Levels: Dec Apr Jan Mar f2 <- x1 %>% factor() %>% fct_inorder() f2 #> [1] Dec Apr Jan Mar #> Levels: Dec Apr Jan Mar
If you ever need to access the set of valid levels directly, you can do so with levels()
:
levels(f2) #> [1] "Dec" "Apr" "Jan" "Mar"