Home > Blockchain >  How to choose same colors for same categories from two different dataframes while plotting them with
How to choose same colors for same categories from two different dataframes while plotting them with

Time:06-26

I have two dataframes, I am using to plot geom_area and geom_line. The categories are common in both dataframes, except their numerical value.

Below are my sample dataframes:

#df_one, for geom_area()
   Timestamp      Topic           Value_A
  01/01/2019     News           10
  02/01/2019     Sports         11
  03/01/2019     Entertainment  12
   ...
  01/01/2020     Weather        5
  02/01/2020     News           6
  03/01/2020     Business       7
   ...
  01/01/2021     Sports         8
  02/01/2021     Business       4
  03/01/2021     News           9
   ...
  29/12/2021     Entertainment  12
  30/12/2021     News           13
  31/12/2021     Sports         14

And this is the second one

#df_two, for line plot
  Timestamp      Topic         Value_B
  01/01/2019     Weather       1.0
  02/01/2019     Business      1.1
  03/01/2019     News          1.2
   ...
  01/01/2020     Entertainment  5.0
  02/01/2020     Sports         6.5
  03/01/2020     Business       7.3
   ...
  01/01/2021     Sports         8.8
  02/01/2021     Business       4.2
  03/01/2021     Sports         9.2
   ...
  29/12/2021     Business       1.2
  30/12/2021     News           1.3
  31/12/2021     Weather        1.4

I am doing the following steps:

#convert date column into proper format
df_one$Timestamp <- as.Date(df_one$Timestamp)

#sort according to dates
df_one <- df_one[order(as.Date(df_one$Timestamp, format="%Y/%m/%d")),]


library(randomcoloR)
n <- 15
my_cols_one <- distinctColorPalette(n)

names(my_cols_one) = unique(df_one$Topic) #I will use this for both since Topics are common

list_one <- 
  df_one %>%
  ## create year variable by which you split into a list
  mutate(year = lubridate::year(Timestamp)) %>%
  split(.$year) %>%
  ## pass this list to a loop function to create three separate plots 
  map(~ggplot(data = .x, aes(x=Timestamp, y=Frequency, fill=Topic))   
        scale_x_date(date_breaks = '1 month', date_labels = "%b-%y") 
        geom_area(alpha=0.6 , size=1, colour="black", position = position_fill()) 
        theme(legend.position="bottom", legend.box = "horizontal") 
        ggtitle("Reliable") 
        guides(fill = guide_legend(nrow = 2, label.position = "bottom"))  
        scale_fill_manual(NULL, values = my_cols_one, limits = unique(.x$Topic))
  )

#now for df_two

#convert date column into proper format
df_two$Timestamp <- as.Date(df_two$Timestamp)

#sort according to dates
df_two <- df_two[order(as.Date(df_one$Timestamp, format="%Y/%m/%d")),]


df_two <- df_two %>% 
  group_by(created_at = lubridate::floor_date(created_at, "15 days"), Topic) %>% 
  dplyr::summarise(Average_Value = mean(Value_B))


list_two <- 
  df_two %>%
  ## create year variable by which you split into a list
  mutate(year = lubridate::year(created_at)) %>%
  split(.$year) %>%
  ## pass this list to a loop function to create three separate plots 
  map(~ggplot(data = .x, aes(x=created_at, y=avg_sentiment, color=Topic))   
        scale_x_date(date_breaks = '1 month', date_labels = "%b-%y") 
        geom_line() 
        theme(legend.position="bottom", legend.box = "horizontal", plot.background = element_blank()) 
        ggtitle("Title") 
        guides(fill = guide_legend(nrow = 2, label.position = "bottom"))  
        ## you will need to set the limits to the unique values in each plot
        ## I am also removing the guide title because of the visual crowding
        scale_fill_manual(NULL, values = my_cols_one, limits = unique(.x$Topic)) 
        labs(title = '',
             x = 'Date',
             y = 'Average Value',
             color=""))

Now finally to plot these together

do.call("grid.arrange", c(list_one, list_two, ncol=2, nrow=2))

So the idea is to have two different plots of two years on top of each other using same color, to me, the output is different.

Any help please?

CodePudding user response:

I found the solution:

I was using scale_fill_manual for df_two whereas it should've been scale_color_manual since I was using geom_line.

So I changed scale_fill_manual(NULL, values = my_cols_q, limits = unique(.x$Topic))

to

scale_color_manual(NULL, values = my_cols_q, limits = unique(.x$Topic)) and its working as expected.

  • Related