Home > other >  I observe a conflict between numbering the values and ordering the group in Ggplot
I observe a conflict between numbering the values and ordering the group in Ggplot

Time:03-02

I observe a conflict between numbering the values and ordering the group in Ggplot.

Dears,

Here is a sample of ma dataset dput(IP[1:10, ]):

structure(list(accession = c("AT5G23310", "ATCG00740", "AT4G20130", 
"AT5G51100", "AT3G06730", "AT2G28000", "AT2G24020", "AT1G73990", 
"AT5G20720", "AT5G45390"), name = c("FSD3 / PAP4", "RPOA", "PTAC14 / PAP7", 
"FSD2 / PAP9", "CITRX / PAP10", "CPN60A1", "STIC2", "SPPA", "CPN20", 
"CLPP4"), description = c("Fe superoxide dismutase 3", "RNA polymerase subunit alpha", 
"plastid transcriptionally active 14", "Fe superoxide dismutase 2", 
"Thioredoxin z", "chaperonin-60alpha", "Uncharacterised BCR, YbaB family COG0718", 
"signal peptide peptidase", "chaperonin 20", "CLP protease P4"
), class = c("int_D", "int_D", "int_D", "int_D", "int_D", "int_D", 
"int_D", "int_D", "int_D", "int_D"), FC = c(10.8808319521963, 
10.8048308965242, 10.4457101811235, 10.399581594615, 9.76710767914034, 
8.40981567320428, 8.09336699899536, 7.39700419044091, 7.36589576056924, 
7.24457380682909), iBAQ = c(0.12855586361859, 0.595067840872386, 
0.403067430310179, 0.371518817592689, 0.584834508323074, 0.0271550563144128, 
0.0088451761756162, 0.00151518236884624, 0.0104882385666527, 
0.00327673100220722), thylakoid = c("thylakoid", "thylakoid", 
"thylakoid", "thylakoid", "thylakoid", "thylakoid", "thylakoid", 
"thylakoid", "thylakoid", "thylakoid")), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

I try to generate a violin plot and boxplot with grouped values. I can number the values for each group (script1) but the order of the group is not respected. The function mutate(class = fct_relevel(class,"int_D", "prox_D","int_L","prox_L")) %>% doesn't works in that script:

Script 1 : the order of the group is not respected but I can number the values for each class

# sample size
sample_size = IP %>% group_by(class) %>% summarize(num=n())

IP %>%
  left_join(sample_size) %>%
  mutate(class = fct_relevel(class,"int_D", "prox_D","int_L","prox_L"))%>%
  mutate(class = paste0(class, "\n", "n=", num)) %>%
  ggplot( aes(x=class, y=FC, fill = class))  
  geom_violin(trim = FALSE, width=0.5, color="grey", size=0.1)  
  geom_boxplot(width=0.1, fill="white", alpha=1)  
  scale_fill_manual(values=c("gold3","gold3","green4","green4"))  
  ylim(0,15) 
  theme_ipsum()  
  theme(legend.position="none",  plot.title = element_text(size=11))  
  ggtitle("thylakoid")  
  xlab("") 

If I remove the mutate function, the order of the group is respected but I lost the numbering of the values

Script 2: the order of the group is respected but I lost the numbering of the values

# sample size
sample_size = IP %>% group_by(class) %>% summarize(num=n())


IP %>%
  left_join(sample_size) %>%
  mutate(class = fct_relevel(class,"int_D", "prox_D","int_L","prox_L"))%>%
  ggplot( aes(x=class, y=FC, fill = class))  
  geom_violin(trim = FALSE, width=0.5, color="grey", size=0.1)  
  geom_boxplot(width=0.1, fill="white", alpha=1)  
  scale_fill_manual(values=c("gold3","gold3","green4","green4"))  
  ylim(0,15) 
  theme_ipsum()  
  theme(legend.position="none",  plot.title = element_text(size=11))  
  ggtitle("thylakoid")  
  xlab("")

Do you have a solution to have both the values numbering and the right order?

All the best!

CodePudding user response:

The problem is after the paste, class is no longer a factor. You could try to mutate class adding the number, fetch the result, get the new levels using unique(IP$class), sorting them however you want and convert class to a factor again using this new levels.

CodePudding user response:

In addition to @Josep Puyo's answer, note that you can supply calculated axis labels directly. For example by adding this axis declaration to script 2:

## some ggplot instructions  
    scale_x_discrete(        
        labels = paste(
            unique(IP$class),
            table(IP$class),
            sep = '\n'
        )
    ) ##   some more ggplot instructions

(In this context, 'class' is a somewhat unfortunate variable name as it collides with function class.)

  • Related