I am looking for colour frequency by month. I would like to make a line graph with each colour's percentage for each month. This is my data:
ID color1 color2 color3 date
55 red blue NA 2020-03-15
67 yellow NA NA 2020-05-02
83 blue yellow NA 2020-05-17
78 red yellow blue 2020-05-15
43 green NA NA 2021-01-27
29 yellow green NA 2021-01-03
I need something like this to graph. I need the month's article count as the denominator. So if the ID
has multiple colours (for example, all IDs
in 03/2020
are blue and red), the total percentage can be higher than 100.
Month n freq_blue freq_red freq_yellow freq_green %_blue %_red _yellow %_green
03-2020 1 1 1 0 0 100 100 0 0
04-2020 0 0 0 0 0 0 0 0 0
05-2020 3 2 1 3 0 66.7 33.3 100 0
06-2020 0 0 0 0 0 0 0 0 0
07-2020 0 0 0 0 0 0 0 0 0
08-2020 0 0 0 0 0 0 0 0 0
09-2020 0 0 0 0 0 0 0 0 0
10-2020 0 0 0 0 0 0 0 0 0
11-2020 0 0 0 0 0 0 0 0 0
12-2020 0 0 0 0 0 0 0 0 0
01-2021 2 0 0 1 2 0 0 50 100
CodePudding user response:
As suggested, it is helpful if you could edit your original post/question with your code instead of in the comments below.
Based on what you have (and your prior question) this may be helpful.
Consider creating a new column month_total
that you can use for percentage calculation - it appears you want to know the number of IDs for a given month (it was unclear if a color could appear more than once in a row).
After you determine the frequencies and percentages, and use complete
to fill in missing months and colors, you can use fill
to include the monthly totals as well.
library(tidyverse)
library(lubridate)
df$date <- as.Date(df$date)
df %>%
mutate(month = month(date), year = year(date)) %>%
pivot_longer(cols = starts_with("color")) %>%
filter(!is.na(value)) %>%
group_by(month, year) %>%
mutate(month_total = n_distinct(ID)) %>%
group_by(value, month_total, .add = T) %>%
summarise(freq = n(), percent = freq/month_total[1] * 100) %>%
ungroup() %>%
complete(year, month = 1:12, value = c("blue", "red", "yellow", "green"), fill = list(freq = 0, percent = 0)) %>%
group_by(year, month) %>%
fill(month_total, .direction = "updown") %>%
pivot_wider(id_cols = c(month, year, month_total), names_from = value, values_from = c(freq, percent)) %>%
replace_na(list(month_total = 0))
Output
month year month_total freq_blue freq_green freq_red freq_yellow percent_blue percent_green percent_red percent_yellow
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2020 0 0 0 0 0 0 0 0 0
2 2 2020 0 0 0 0 0 0 0 0 0
3 3 2020 1 1 0 1 0 100 0 100 0
4 4 2020 0 0 0 0 0 0 0 0 0
5 5 2020 3 2 0 1 3 66.7 0 33.3 100
6 6 2020 0 0 0 0 0 0 0 0 0
7 7 2020 0 0 0 0 0 0 0 0 0
8 8 2020 0 0 0 0 0 0 0 0 0
9 9 2020 0 0 0 0 0 0 0 0 0
10 10 2020 0 0 0 0 0 0 0 0 0
# … with 14 more rows