Home > Mobile >  How group dataset in a boxplot?
How group dataset in a boxplot?

Time:09-25

I have been trying to figure out how to group 9 datasets into 3 different groups (1, 2, and 3). I have 3 different data frames that look like this:

ID1 ID2 dN  dS  Omega   Label_ID1   Label_ID2   Group
QJY77946     NP_073551   0.0293      0.0757      0.3872      229E-CoV    229E-CoV    Intra
QJY77954     NP_073551   0.0273      0.0745      0.3668      229E-CoV    229E-CoV    Intra
...

So, the only columns that I´m interested in are three: dN, dS, and Omega. My main goal is to take these three columns from my data frames and plots in a boxplot using Rstudio. To do that, first I take the 3 columns of each data frame with these lines:

dN_1 <- df_1$dN
dS_1 <- df_1$dS
Omega_1 <- df_1$Omega

Then, to generate the plot I use this line (option 1):

boxplot(dN_S, dS_S, Omega_S, dN_M, dS_M, Omega_M, dN_E, dS_E, Omega_E,
        main = "Test",
        xlab = "Frames",
        ylab = "Distribution",
        col = "red")

enter image description here

My goal is to group these 9 boxes into 3 separate groups: enter image description here

I know that using ggplot2 could be easier, so my option 2 is to use these lines (option 2):

df_1 %>%
  ggplot(aes(y=dN_S))  
  geom_boxplot(
    color = "blue", 
    fill = "blue",
    alpha = 0.2,
    notch = T,
    notchwidth = 0.8)

However, you can see that I couldn´t find a way to plot all groups in the same plot. So how can I group my data in the boxplot using option 1 or option 2? Maybe the second option is less development but perhaps someone could help with that too.

CodePudding user response:

One way to accomplish this is by providing ggplot() another aesthetic, like fill. Here's a small reproducible example:

library(tidyverse)
df <- tibble(category = rep(letters[1:4], 5),
             time = c(rep("before", 10), rep("after", 10)),
             num = rnorm(20))

df %>%
  ggplot()  
  geom_boxplot(aes(x=category, y=num, fill = time))

enter image description here

Let me know if you're looking for something else.

CodePudding user response:

library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)

set.seed(123)

df_s <- data.frame(dN = runif(20), 
                 dS = runif(20),
                 Omega = runif(20))

df_m <- data.frame(dN = runif(20), 
                   dS = runif(20),
                   Omega = runif(20))

df_e <- data.frame(dN = runif(20), 
                   dS = runif(20),
                   Omega = runif(20))

df <- 
  list(df_s, df_m, df_e) %>% 
  set_names(c("S", "M", "E")) %>% 
  map_dfr(bind_rows, .id = "df") %>% 
  pivot_longer(-df)

ggplot(df) 
  geom_boxplot(aes(x = name, y = value)) 
  facet_wrap(~df, nrow = 1)

Created on 2021-09-24 by the reprex package (v2.0.0)

  • Related