Factor reordering in R

Posted on 5th March 2018
by Rupert Overall

This short post is a question I got from a student recently and which I have encountered several times before. Confusing problem but easy answer. It is described (better and in more detail) elsewhere on the web—but I will add to the info-smog anyway…

”How can I reorder a factor in R?“

In R, a factor is a way of storing a data vector which contains categories. The categories are represented as character labels and stored as pointers to these labels. For example, consider a factor contining data about pets;

pets <- factor(c("rabbit", "cat", "cat", "dog", "cat", "dog"))

There are 6 pets which fall into 3 categories; ”rabbit“,”cat“, and”dog“. In R, these categories are called levels. The levels are ordered alphabetically;

levels(pets)
[1] "cat"    "dog"    "rabbit"

And note how the categories are stored in the factor object as indices pointing to the levels;

as.numeric(pets)
[1] 3 1 1 2 1 2

Now back to the question, how can the order of the factor be changed so that, for example, ”rabbit“ is the first level. This is valuable if you want to plot the data in a certain order or if you want to do something like a Dunnett test, where the control factor must be first.

The order of the factor does not depend on how your data are organised, but how the factor is organised. You need to change the factor so that the levels are in the order you want (i.e. with the control group first). By default, R puts them in alphabetical order.

There are 2 easy ways to reorder factor levels:

1. Look at the command relevel. This will allow you to select one of your levels to be control (i.e. go first in the list). This is sufficient for Dunnett.

pets1 <- relevel(pets, "rabbit")
pets1
[1] rabbit cat    cat    dog    cat    dog  
Levels: rabbit cat dog

2. Specify level order when you create a factor (see the parameter ‘levels’ in factor). This will allow you to completely change the level order. I do this a lot so that boxplots come out in the correct order for example.

pets2 <- factor(c("rabbit", "cat", "cat", "dog", "cat", "dog"), levels=c("rabbit", "dog", "cat"))
pets2
[1] rabbit cat    cat    dog    cat    dog  
Levels: rabbit dog cat

Caution! You cannot explicitly reassign the levels of the factor. What you are doing in this case is actually renaming the levels.

pets3 <- pets
levels(pets3) <- c("rabbit", "dog", "cat") # Wrong!
pets3 # See how the categories' names have changed.
[1] cat    rabbit rabbit dog    rabbit dog  
Levels: rabbit dog cat

Note that in both cases the underlying character vector remained unchanged, but the indices of the levels is different.

pets
[1] rabbit cat    cat    dog    cat    dog  
Levels: cat dog rabbit
as.numeric(pets)
[1] 3 1 1 2 1 2
pets1
[1] rabbit cat    cat    dog    cat    dog  
Levels: rabbit cat dog
as.numeric(pets1)
[1] 1 2 2 3 2 3
pets2
[1] rabbit cat    cat    dog    cat    dog  
Levels: rabbit dog cat
as.numeric(pets2)

[1] 1 3 3 2 3 2