Home > Software engineering >  How to use barplot() to get two variables side by side?
How to use barplot() to get two variables side by side?

Time:10-03

My data frame is number of gifts for boys and number of gifts for girls according to each state.

I'm trying to plot two variables side by side using the barplot() function.

I can't figure out how to get both on there, the way I'm doing it now is that it's assigning two colours but to the same parameter.

Essentially what I want to do is assign two bars for each region (so for New Hampshire it plots df$GB df$GG, Mass it plots df$GB and df$GG, etc...)

Any thoughts?

y <- as.matrix(df$GB, df$GG)

barplot(y, main='Gifts', col=c('green', 'red'), 
        beside=TRUE, legend=c('G', 'H'), xaxt='n')

axis(1, at=1: nrow(df), labels=df$Region, las=2)

CodePudding user response:

You will get a faster response with code adapted to your data if you provide a reproducible sample using something like dput(head(df, 5)). Without that I'll provide some data that may resemble yours:

df <- structure(list(Region = c("A", "B", "C", "D", "E"), GB = c(1L, 
5L, 10L, 8L, 2L), GG = c(4L, 2L, 9L, 1L, 10L)), class = "data.frame", row.names = c(NA, 
-5L))

Now the boxplot:

barplot(t(df[, 2:3]), beside=TRUE, names=df$Region)
legend("topleft", c("Girls", "Boys"), fill=c("black", "gray"))

Barplot

CodePudding user response:

You are right that as input barplot likes matrices (or vectors), where the columns represent the units, and the rows the groups.

So you were close and just not successful in creating the needed matrix yet. Note, that in as.<something> we always put in just one object, while you put two columns extra. You want a subset of the data frame as.matrix(df[c('GB', 'GG')]). You also want to transpose it, t(as.matrix(df[c('GB', 'GG')])), where t(df[c('GB', 'GG')]) is sufficient since t automatically coerces to a matrix.

barplot() is rich in arguments, it's definitely worth to read ?barplot on what is possible. E.g. you may specify additional arguments for the legend with a list in args.legend (read '?legend for options) and define a names.arg for the x-axis labels. For the latter I'd use a named vector nm that assigns abbreviations to the regions, the explanation can be put into the description below the graphic. (In this case this might be a better idea than rotating long labels, even though it's nice to know how to do it, see at the bottom of my answer, though.)

(nm <- setNames(c("CT", "ME", "MA", "NH", "RI", "VT"), df$region))
# Connecticut         Maine Massachusetts New Hampshire  Rhode Island       Vermont 
#        "CT"          "ME"          "MA"          "NH"          "RI"          "VT" 

So we could the barplot like so.

barplot(t(df[c('GB', 'GG')]), main='Gifts', col=c('green', 'red'), 
        beside=TRUE, legend.text=c('Girls', 'Boys'), 
        names.arg=nm[df$region], ylim=c(0, max(df[c('GB', 'GG')])   1),
        args.legend=list(title='Sex', cex=.9))

enter image description here

If you rely on the long names, notice that barplot invisibly outputs the midpoints of the bars, which can be used for label positioning. You can also extend the mar of axis 1 a little.

op <- par(mar=c(6, 4, 4, 2))  ## set par

b <- barplot(t(df[c('GB', 'GG')]), beside=TRUE)
axis(side=1, at=colMeans(b), labels=df$region, las=2)

par(op)  ## restore old par

enter image description here


Data:

df <- structure(list(GB = c(6L, 2L, 5L, 10L, 3L, 1L), GG = c(8L, 5L, 
4L, 10L, 2L, 6L), region = c("Connecticut", "Maine", "Massachusetts", 
"New Hampshire", "Rhode Island", "Vermont")), row.names = c(NA, 
-6L), class = "data.frame")
  • Related