Factors
Modifying factor order
It's often useful to change the order of the factor levels in a visualisation. For example, imagine you want to explore the average number of hours spent watching TV per day across religions:
relig_summary <- gss_cat %>% group_by(relig) %>% summarise( age = mean(age, na.rm = TRUE), tvhours = mean(tvhours, na.rm = TRUE), n = n() ) #> `summarise()` ungrouping output (override with `.groups` argument) ggplot(relig_summary, aes(tvhours, relig)) + geom_point()

It is difficult to interpret this plot because there's no overall pattern. We can improve it by reordering the levels of relig
using fct_reorder()
. fct_reorder()
takes three arguments:
-
f
, the factor whose levels you want to modify. -
x
, a numeric vector that you want to use to reorder the levels. - Optionally,
fun
, a function that's used if there are multiple values ofx
for each value off
. The default value ismedian
.
ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) + geom_point()

Reordering religion makes it much easier to see that people in the "Don't know" category watch much more TV, and Hinduism & Other Eastern religions watch much less.
As you start making more complicated transformations, I'd recommend moving them out of aes()
and into a separate mutate()
step. For example, you could rewrite the plot above as:
relig_summary %>% mutate(relig = fct_reorder(relig, tvhours)) %>% ggplot(aes(tvhours, relig)) + geom_point()
What if we create a similar plot looking at how average age varies across reported income level?
rincome_summary <- gss_cat %>% group_by(rincome) %>% summarise( age = mean(age, na.rm = TRUE), tvhours = mean(tvhours, na.rm = TRUE), n = n() ) #> `summarise()` ungrouping output (override with `.groups` argument) ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) + geom_point()

Here, arbitrarily reordering the levels isn't a good idea! That's because rincome
already has a principled order that we shouldn't mess with. Reserve fct_reorder()
for factors whose levels are arbitrarily ordered.
However, it does make sense to pull "Not applicable" to the front with the other special levels. You can use fct_relevel()
. It takes a factor, f
, and then any number of levels that you want to move to the front of the line.
ggplot(rincome_summary, aes(age, fct_relevel(rincome, "Not applicable"))) + geom_point()

Why do you think the average age for "Not applicable" is so high?
Another type of reordering is useful when you are colouring the lines on a plot. fct_reorder2()
reorders the factor by the y
values associated with the largest x
values. This makes the plot easier to read because the line colours line up with the legend.
by_age <- gss_cat %>% filter(!is.na(age)) %>% count(age, marital) %>% group_by(age) %>% mutate(prop = n / sum(n)) ggplot(by_age, aes(age, prop, colour = marital)) + geom_line(na.rm = TRUE) ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) + geom_line() + labs(colour = "marital")
Finally, for bar plots, you can use fct_infreq()
to
order levels in increasing frequency: this is the simplest type of
reordering because it doesn't need any extra variables. You may want to
combine with fct_rev()
.
gss_cat %>% mutate(marital = marital %>% fct_infreq() %>% fct_rev()) %>% ggplot(aes(marital)) + geom_bar()
