Factors

Creating factors

Imagine that you have a variable that records month:

x1 <- c("Dec", "Apr", "Jan", "Mar")

Using a string to record this variable has two problems:

  1. There are only twelve possible months, and there's nothing saving you from typos:

    x2 <- c("Dec", "Apr", "Jam", "Mar")
  2. It doesn't sort in a useful way:
    sort(x1)
    #> [1] "Apr" "Dec" "Jan" "Mar"

You can fix both of these problems with a factor. To create a factor you must start by creating a list of the valid levels:

month_levels <- c(
  "Jan", "Feb", "Mar", "Apr", "May", "Jun", 
  "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
)

Now you can create a factor:

y1 <- factor(x1, levels = month_levels)
y1
#> [1] Dec Apr Jan Mar
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
sort(y1)
#> [1] Jan Mar Apr Dec
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

And any values not in the set will be silently converted to NA:

y2 <- factor(x2, levels = month_levels)
y2
#> [1] Dec  Apr  <NA> Mar 
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

If you want a warning, you can use readr::parse_factor():

y2 <- parse_factor(x2, levels = month_levels)
#> Warning: 1 parsing failure.
#> row col           expected actual
#>   3  -- value in level set    Jam

If you omit the levels, they'll be taken from the data in alphabetical order:

factor(x1)
#> [1] Dec Apr Jan Mar
#> Levels: Apr Dec Jan Mar

Sometimes you'd prefer that the order of the levels match the order of the first appearance in the data. You can do that when creating the factor by setting levels to unique(x), or after the fact, with fct_inorder():

f1 <- factor(x1, levels = unique(x1))
f1
#> [1] Dec Apr Jan Mar
#> Levels: Dec Apr Jan Mar

f2 <- x1 %>% factor() %>% fct_inorder()
f2
#> [1] Dec Apr Jan Mar
#> Levels: Dec Apr Jan Mar

If you ever need to access the set of valid levels directly, you can do so with levels():

levels(f2)
#> [1] "Dec" "Apr" "Jan" "Mar"