I have a data frame (below) that I want to summarise by column.
sample <- tibble(Scenario = c("Aggressive","Aggressive","Conservative","Aggressive","Likely","Aggressive","Conservative","Likely","Likely","Aggressive","Conservative","Conservative"),
`Jan 2022` = c(5.5,15,15.77,45.2,NA,NA,NA,NA,NA,NA,NA,NA),
`Feb 2022` = c(NA,NA,NA,NA,20.5,11.1,14.4,55.5,NA,NA,NA,NA),
`Mar 2022` = c(NA,NA,NA,NA,NA,NA,NA,NA,88.5,9.5,18.9,25.5))
This is what the output should look like:
# A tibble: 3 × 4
# Groups: Scenario [3]
Scenario `Feb 2022` `Jan 2022` `Mar 2022`
<chr> <dbl> <dbl> <dbl>
1 Aggressive 11.1 65.7 9.5
2 Conservative 14.4 15.8 44.4
3 Likely 76 0 88.5
Below is the code I used to get this output. As you see, I used pivot_longer
and then applied my group_by
and summarise
to get the desired output. Then I used pivot_wider
to restore it to the desired wide format.
sample %>%
pivot_longer(cols = c(`Jan 2022`:`Mar 2022`), names_to = "Date", values_to = "Hours") %>%
group_by(Scenario, Date) %>%
summarise(Hours = sum(Hours, na.rm = T)) %>%
pivot_wider(names_from = Date, values_from = Hours)
I hope to find a more efficient way to do this without the need to use pivot_longer
. I tried running the below code on the original data frame, but obviously, it doesn't work as intended:
sample %>%
group_by(Scenario) %>%
summarise(Hours = lapply(X = c(`Jan 2022`:`Mar 2022`), FUN = function(x){sum(x, na.rm = T)}))
Here are some of the warnings and errors I'm getting:
Error: Problem with `summarise()` column `Hours`.
ℹ `Hours = lapply(...)`.
x NA/NaN argument
ℹ The error occurred in group 1: Scenario = "Aggressive".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In `Jan 2022`:`Mar 2022` :
numerical expression has 5 elements: only the first used
2: In `Jan 2022`:`Mar 2022` :
numerical expression has 5 elements: only the first used
I figure there's a way to do this with an apply function but am open to any suggestions. The fewer lines of code required, the better.
CodePudding user response:
With tidyverse
, it is across
to loop over the columns, instead of lapply
library(dplyr)
sample %>%
group_by(Scenario) %>%
summarise(across(where(is.numeric), sum, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 3 × 4
Scenario `Jan 2022` `Feb 2022` `Mar 2022`
<chr> <dbl> <dbl> <dbl>
1 Aggressive 65.7 11.1 9.5
2 Conservative 15.8 14.4 44.4
3 Likely 0 76 88.5
CodePudding user response:
With data.table you can do this:
data.table::setDT(sample)[, lapply(.SD, sum, na.rm=T), by=Scenario]
Output:
Scenario Jan 2022 Feb 2022 Mar 2022
1: Aggressive 65.70 11.1 9.5
2: Conservative 15.77 14.4 44.4
3: Likely 0.00 76.0 88.5
CodePudding user response:
Additional solution option
data.table
library(data.table)
setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = Scenario, .SDcols = is.numeric]
Scenario Jan 2022 Feb 2022 Mar 2022
1: Aggressive 65.70 11.1 9.5
2: Conservative 15.77 14.4 44.4
3: Likely 0.00 76.0 88.5