Here you will see more examples of how to build histograms in base R. Note that when the total counts for two or more samples are different, we can convert the vertical axis to density so the distributions can be easily compared on the same plot.
Problem
You want to make a histogram or density plot.
Solution
Some sample data: these two vectors contain 200 data points each:
set.seed(1234)
rating <- rnorm(200)
head(rating)
#> [1] -1.2070657 0.2774292 1.0844412 -2.3456977 0.4291247 0.5060559
rating2 <- rnorm(200, mean=.8)
head(rating2)
#> [1] 1.2852268 1.4967688 0.9855139 1.5007335 1.1116810 1.5604624
When plotting multiple groups of data, some graphing routines require a data frame with one column for the grouping variable and one for the measure variable.
# Make a column to indicate which group each value is in
cond <- factor( rep(c("A","B"), each=200) )
data <- data.frame(cond, rating = c(rating,rating2))
head(data)
#> cond rating
#> 1 A -1.2070657
#> 2 A 0.2774292
#> 3 A 1.0844412
#> 4 A -2.3456977
#> 5 A 0.4291247
#> 6 A 0.5060559
# Histogram
hist(rating)
# Use 8 bins (this is only approximate - it places boundaries on nice round numbers)
# Make it light blue #CCCCFF
# Instead of showing count, make area sum to 1, (freq=FALSE)
hist(rating, breaks=8, col="#CCCCFF", freq=FALSE)
# Put breaks at every 0.6
boundaries <- seq(-3, 3.6, by=.6)
boundaries
#> [1] -3.0 -2.4 -1.8 -1.2 -0.6 0.0 0.6 1.2 1.8 2.4 3.0 3.6
hist(rating, breaks=boundaries)
# Kernel density plot
plot(density(rating))
Multiple groups with kernel density plots.
plot.multi.dens <- function(s)
{
junk.x = NULL
junk.y = NULL
for(i in 1:length(s)) {
junk.x = c(junk.x, density(s[[i]])$x)
junk.y = c(junk.y, density(s[[i]])$y)
}
xr <- range(junk.x)
yr <- range(junk.y)
plot(density(s[[1]]), xlim = xr, ylim = yr, main = "")
for(i in 1:length(s)) {
lines(density(s[[i]]), xlim = xr, ylim = yr, col = i)
}
}
# the input of the following function MUST be a numeric list
plot.multi.dens( list(rating, rating2))
The sm
package also includes a way of doing multiple density plots. The data must be in a data frame.
library(sm)
sm.density.compare(data$rating, data$cond)
# Add a legend (the color numbers start from 2 and go up)
legend("topright", levels(data$cond), fill=2+(0:nlevels(data$cond)))
Source: W. Chung, http://www.cookbook-r.com/Graphs/Histogram_and_density_plot/
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.