I am having issues getting a plot for my dataframe. Attached below is my sample matrix dataframe. This class is confirmed a dataframe.
Dataframe (10 rows, 59 columns originally)
I have originally tried to create a plot using facet_grid, with poor results.
(Error: At least one layer must contain all faceting variables: x
.
- Plot is missing
x
- Layer 1 is missing
x
)
Instead, I decided to simplify it and create a box plot with everything on 1 graph. However, my graph looked like this: ugly graph
My simple plot code is below; does anyone know why things are plotted poorly? Any insight is helpful. This is my first post, so I hope things are formatted correctly.
ggplot(newdf, aes(x, y, fill = x)) geom_boxplot()
The end goal would be a boxplot shown for each gene with the observations of the individuals within each box. I am following this example:example
CodePudding user response:
First of all, when following examples you should make sure to change the variable's names in them to your variable's names.
In ggplot(newdf, aes(x, y, fill = x)) geom_boxplot()
, the errors probably come from the fact that you have neither an x or y column in newdf.
Or if you do, they aren't what you need for your goal.
The end goal being
a boxplot shown for each gene with the observations of the individuals within each box
your x variabe needs to be a column in which you have the genes names and y a column with the value of the observation for these gene.
In other word, you need to reformat your data.frame into a long format (right now you have what's called a wide format).
There many ways to do so, a simple one with base R is to repeat the name of each column times the number of row for the gene column and then unlisting your data.frame into the value column like so :
# needed a a data.frame so made a fake one to illustrate :
newdf <- data.frame(gene1 = runif(10, ), gene2 = runif(10), gene3 = runif(10))
# this convert your data.frame from wide to long
newdf.long <- data.frame(
gene = rep(names(newdf), each=nrow(newdf)),
value = unlist(newdf),
row.names = NULL
)
Then, you need to make sure to use the right variable's name when building your ggplot :
ggplot(newdf.long, aes(x = gene, y = value, fill = gene)) geom_boxplot()