I am fairly new to R and happen to be stuck with the following problem: I want to fill in an empty data frame based on multiple criteria and sum the variable of interest. The data frames look like this:
The empty matrix, that I want to fill:
dat <- data.frame(year = 2021,
month = 1:12,
green = "N/A",
yellow = "N/A",
red = "N/A",
blue = "N/A",
purple = "N/A")
dat
year month green yellow red blue purple
1 2021 1 N/A N/A N/A N/A N/A
2 2021 2 N/A N/A N/A N/A N/A
3 2021 3 N/A N/A N/A N/A N/A
4 2021 4 N/A N/A N/A N/A N/A
5 2021 5 N/A N/A N/A N/A N/A
6 2021 6 N/A N/A N/A N/A N/A
7 2021 7 N/A N/A N/A N/A N/A
8 2021 8 N/A N/A N/A N/A N/A
9 2021 9 N/A N/A N/A N/A N/A
10 2021 10 N/A N/A N/A N/A N/A
11 2021 11 N/A N/A N/A N/A N/A
12 2021 12 N/A N/A N/A N/A N/A
And data frame with the original data:
dat_org <- data.frame(year = c(2020,2020,2020,2020,2020,2021,2021,2021,2021,2021,2021,2021,2021),
month = c(1,4,6,9,12,1,1,1,5,5,5,10,10),
height = c(23,35,76,87,55,43,21,12,45,67,78,23,34),
colour = c("green", "green", "red", "yellow", "purple", "green", "blue", "blue", "blue", "red", "yellow", "yellow", "red"))
dat_org
year month height colour
1 2020 1 23 green
2 2020 4 35 green
3 2020 6 76 red
4 2020 9 87 yellow
5 2020 12 55 purple
6 2021 1 43 green
7 2021 1 21 blue
8 2021 1 12 blue
9 2021 5 45 blue
10 2021 5 67 red
11 2021 5 78 yellow
12 2021 10 23 yellow
13 2021 10 34 red
Now, I want to transfer the "height" values in dat_org to dat, based on the year, month and colour. If no values in dat_org based on these criteria exist, I want to replace N/A with 0. If there are several rows, that meet the conditions, I want to sum up the variable (as it happend for year = 2021, month = 1 and colour ="blue").
The result would look like this:
dat
year month green yellow red blue purple
1 2021 1 43 0 0 33 0
2 2021 2 0 0 0 0 0
3 2021 3 0 0 0 0 0
4 2021 4 0 0 0 0 0
5 2021 5 0 78 67 45 0
6 2021 6 0 0 0 0 0
7 2021 7 0 0 0 0 0
8 2021 8 0 0 0 0 0
9 2021 9 0 0 0 0 0
10 2021 10 0 23 34 0 0
11 2021 11 0 0 0 0 0
12 2021 12 0 0 0 0 0
I was first thinking to use "aggregate" or "group by", but then it only lists the existing months of df_org (here 1,5,10), but I also want to fill in N/A in df with a zero, if the conditions are not fulfilled.
The logic is similar to sumifs in Excel, but I am not sure how to apply this in R.
Thanks in advance for your help!
CodePudding user response:
@stefan, thanks for the quick response. If've tried your suggestion and it gives me the following results:
dat_org %>%
group_by(year, month, colour) %>% summarise(height = sum(height), .groups = "drop") %>%
pivot_wider(names_from = colour, values_from = height, values_fill = 0)
year month green red yellow purple blue
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2020 1 23 0 0 0 0
2 2020 4 35 0 0 0 0
3 2020 6 0 76 0 0 0
4 2020 9 0 0 87 0 0
5 2020 12 0 0 0 55 0
6 2021 1 43 0 0 0 33
7 2021 5 0 67 78 0 45
8 2021 10 0 34 23 0 0
Now, what is still missing, are the months, where no values are given in dat_org, such as month 2,3,4,6,7,8,9,11,12 in 2021.
CodePudding user response:
This is basically the code from my comment which first aggregates the data and after that reshape to wide format. However, I added two pieces. First, I use tidyr::complete
to add all months from 1 to 12 and finally I filter
for year 2021
dat_org <- data.frame(year = c(2020,2020,2020,2020,2020,2021,2021,2021,2021,2021,2021,2021,2021),
month = c(1,4,6,9,12,1,1,1,5,5,5,10,10),
height = c(23,35,76,87,55,43,21,12,45,67,78,23,34),
colour = c("green", "green", "red", "yellow", "purple", "green", "blue", "blue", "blue", "red", "yellow", "yellow", "red"))
library(dplyr)
library(tidyr)
colors <- unique(dat_org$colour)
dat_org %>%
group_by(year, month, colour) %>%
summarise(height = sum(height), .groups = "drop") %>%
complete(year, month = 1:12, colour, fill = list(height = 0)) %>%
pivot_wider(names_from = colour, values_from = height, values_fill = 0) %>%
filter(year == 2021)
#> # A tibble: 12 × 7
#> year month blue green purple red yellow
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2021 1 33 43 0 0 0
#> 2 2021 2 0 0 0 0 0
#> 3 2021 3 0 0 0 0 0
#> 4 2021 4 0 0 0 0 0
#> 5 2021 5 45 0 0 67 78
#> 6 2021 6 0 0 0 0 0
#> 7 2021 7 0 0 0 0 0
#> 8 2021 8 0 0 0 0 0
#> 9 2021 9 0 0 0 0 0
#> 10 2021 10 0 0 0 34 23
#> 11 2021 11 0 0 0 0 0
#> 12 2021 12 0 0 0 0 0