Home > Blockchain >  How can I apply a function across these columns without doing tedious transformations?
How can I apply a function across these columns without doing tedious transformations?

Time:05-26

I have a data frame (below) that I want to summarise by column.

sample <- tibble(Scenario = c("Aggressive","Aggressive","Conservative","Aggressive","Likely","Aggressive","Conservative","Likely","Likely","Aggressive","Conservative","Conservative"),
           `Jan 2022` = c(5.5,15,15.77,45.2,NA,NA,NA,NA,NA,NA,NA,NA),
           `Feb 2022` = c(NA,NA,NA,NA,20.5,11.1,14.4,55.5,NA,NA,NA,NA),
           `Mar 2022` = c(NA,NA,NA,NA,NA,NA,NA,NA,88.5,9.5,18.9,25.5))

This is what the output should look like:

# A tibble: 3 × 4
# Groups:   Scenario [3]
  Scenario     `Feb 2022` `Jan 2022` `Mar 2022`
  <chr>             <dbl>      <dbl>      <dbl>
1 Aggressive         11.1       65.7        9.5
2 Conservative       14.4       15.8       44.4
3 Likely             76          0         88.5

Below is the code I used to get this output. As you see, I used pivot_longer and then applied my group_by and summarise to get the desired output. Then I used pivot_wider to restore it to the desired wide format.

sample %>% 
  pivot_longer(cols = c(`Jan 2022`:`Mar 2022`), names_to = "Date", values_to = "Hours") %>% 
  group_by(Scenario, Date) %>% 
  summarise(Hours = sum(Hours, na.rm = T)) %>% 
  pivot_wider(names_from = Date, values_from = Hours)

I hope to find a more efficient way to do this without the need to use pivot_longer. I tried running the below code on the original data frame, but obviously, it doesn't work as intended:

    sample %>%
  group_by(Scenario) %>%
  summarise(Hours = lapply(X = c(`Jan 2022`:`Mar 2022`), FUN = function(x){sum(x, na.rm = T)}))

Here are some of the warnings and errors I'm getting:

 Error: Problem with `summarise()` column `Hours`.
ℹ `Hours = lapply(...)`.
x NA/NaN argument
ℹ The error occurred in group 1: Scenario = "Aggressive".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In `Jan 2022`:`Mar 2022` :
  numerical expression has 5 elements: only the first used
2: In `Jan 2022`:`Mar 2022` :
  numerical expression has 5 elements: only the first used

I figure there's a way to do this with an apply function but am open to any suggestions. The fewer lines of code required, the better.

CodePudding user response:

With tidyverse, it is across to loop over the columns, instead of lapply

library(dplyr)
sample %>%
   group_by(Scenario) %>%
   summarise(across(where(is.numeric), sum, na.rm = TRUE), .groups = 'drop')

-output

# A tibble: 3 × 4
  Scenario     `Jan 2022` `Feb 2022` `Mar 2022`
  <chr>             <dbl>      <dbl>      <dbl>
1 Aggressive         65.7       11.1        9.5
2 Conservative       15.8       14.4       44.4
3 Likely              0         76         88.5

CodePudding user response:

With data.table you can do this:

data.table::setDT(sample)[, lapply(.SD, sum, na.rm=T), by=Scenario]

Output:

       Scenario Jan 2022 Feb 2022 Mar 2022
1:   Aggressive    65.70     11.1      9.5
2: Conservative    15.77     14.4     44.4
3:       Likely     0.00     76.0     88.5

CodePudding user response:

Additional solution option

data.table

library(data.table)

setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = Scenario, .SDcols = is.numeric]

       Scenario Jan 2022 Feb 2022 Mar 2022
1:   Aggressive    65.70     11.1      9.5
2: Conservative    15.77     14.4     44.4
3:       Likely     0.00     76.0     88.5
  • Related