Home > Software engineering >  How to: condittional summation in multiple columns in R?
How to: condittional summation in multiple columns in R?

Time:12-02

I have a dataset consisting on a first column with dates and many other columns mean daily temperature at different sites (depths in this case). Looks like this:

Date       DayMean_2cm DayMean_5cm DayMean_80cm DayMean_120cm DayMean_160cm DayMean_200cm DayMean_250cm DayMean_300cm DayMean_350cm DayMean_450cm
1 2019-12-28       19.9        19.9        20.0   20.2          20.0          20.2          20.2          20.1        20.5          20.6   
2 2019-12-29        8.76        8.68        8.26          8.30          7.97          8.10          8.07          7.94          8.12          8.17  
3 2019-12-30        2.20        1.78       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.61         -0.414 
4 2019-12-31        3.76        3.31       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.602        -0.414 
5 2020-01-01        4.34        3.91       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.579        -0.398 
6 2020-01-02        3.52        3.45       -0.215        -0.255        -0.706        -0.649        -0.694        -0.818        -0.548        -0.367

What i need to calculate using R is the monthly sumation of the positive temperatures, and the negative temperatures... for each column.

I know how to do it column by column using this piece of code:

MonthIndexes <- DayStats2 %>%
  group_by(Year, Month) %>%
  summarize(MonthFDD = sum(DayMean[which(DayMean<0)]), MonthTDD = sum(DayMean[which(DayMean>0)]))

it works nicely, but i would like to do it for all the columns at once, independently of the number of columns contained in the dataset. If fact i do other calculations in that fashion with this piece of code:

MonthStats <- data %>% 
  group_by(Year, Month) %>% 
  summarise(across(
    .cols = is.numeric, 
    .fns = list(MonthMean=mean, MonthMax=max, MonthMin=min, MonthSD=sd), na.rm = TRUE, 
    .names = "{fn}_{col}"
  ))

However, my problems it that i do not know how to adapt the last piece of code to calculate the conditional summation. I tried this without good results:

MonthIndexes <- DayStats2 %>% 
  group_by(Year, Month) %>% 
  summarise(across(
    .cols = is.numeric, 
    .fns = list(MonthFDD = sum({col}[which({col}<0)]), MonthTDD=sum({col}[which({col}>0)])), na.rm = TRUE, 
    .names = "{fn}_{col}"
  ))

The error returned in RStudio is

Error: Problem with summarise() input ..1. i ..1 = across(...). x comparison (3) is possible only for atomic and list types

Any help will be appreciated. Thanks in advance.

CodePudding user response:

You're very close. The {.col} syntax is used for labels only, not function arguments. Instead you can use purrr-style anonymous functions using ~ to begin the function and .x to refer to the argument, e.g., MonthFDD = ~sum(.x[which(.x < 0)])

See more info and examples on the ?across help page.

data %>%
  group_by(Year, Month) %>% 
  summarise(across(
    .cols = where(is.numeric), 
    .fns = list(MonthFDD = ~sum(.x[which(.x<0)], na.rm = T), MonthTDD= ~sum(.x[which(.x>0)], na.rm = TRUE)), 
    .names = "{fn}_{col}"
  ))

# # A tibble: 2 × 22
# # Groups:   Year [2]
# Year Month MonthFDD_DayMean_2cm MonthTDD_DayMean_2cm MonthFDD_DayMean_5cm MonthTDD_DayMean_5cm MonthFDD_DayMea…
# <int> <int>                <dbl>                <dbl>                <dbl>                <dbl>            <dbl>
#   1  2019    12                    0                34.6                     0                33.7             -0.43
# 2  2020     1                    0                 7.86                    0                 7.36            -0.43
# # … with 15 more variables: MonthTDD_DayMean_80cm <dbl>, MonthFDD_DayMean_120cm <dbl>,
# #   MonthTDD_DayMean_120cm <dbl>, MonthFDD_DayMean_160cm <dbl>, MonthTDD_DayMean_160cm <dbl>,
# #   MonthFDD_DayMean_200cm <dbl>, MonthTDD_DayMean_200cm <dbl>, MonthFDD_DayMean_250cm <dbl>,
# #   MonthTDD_DayMean_250cm <dbl>, MonthFDD_DayMean_300cm <dbl>, MonthTDD_DayMean_300cm <dbl>,
# #   MonthFDD_DayMean_350cm <dbl>, MonthTDD_DayMean_350cm <dbl>, MonthFDD_DayMean_450cm <dbl>,
# #   MonthTDD_DayMean_450cm <dbl>

Using this input:

data = read.table(text = "Date       DayMean_2cm DayMean_5cm DayMean_80cm DayMean_120cm DayMean_160cm DayMean_200cm DayMean_250cm DayMean_300cm DayMean_350cm DayMean_450cm
1 2019-12-28       19.9        19.9        20.0   20.2          20.0          20.2          20.2          20.1        20.5          20.6   
2 2019-12-29        8.76        8.68        8.26          8.30          7.97          8.10          8.07          7.94          8.12          8.17  
3 2019-12-30        2.20        1.78       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.61         -0.414 
4 2019-12-31        3.76        3.31       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.602        -0.414 
5 2020-01-01        4.34        3.91       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.579        -0.398 
6 2020-01-02        3.52        3.45       -0.215        -0.255        -0.706        -0.649        -0.694        -0.818        -0.548        -0.367", header = T) %>% 
  mutate(Date = ymd(Date), Year = year(Date), Month = month(Date)) 

CodePudding user response:

A solution based on data.table:

library(data.table)
library(magrittr)
library(dplyr)
library(lubridate)

data = read.table(text = "Date       DayMean_2cm DayMean_5cm DayMean_80cm DayMean_120cm DayMean_160cm DayMean_200cm DayMean_250cm DayMean_300cm DayMean_350cm DayMean_450cm
1 2019-12-28       19.9        19.9        20.0   20.2          20.0          20.2          20.2          20.1        20.5          20.6   
2 2019-12-29        8.76        8.68        8.26          8.30          7.97          8.10          8.07          7.94          8.12          8.17  
3 2019-12-30        2.20        1.78       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.61         -0.414 
4 2019-12-31        3.76        3.31       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.602        -0.414 
5 2020-01-01        4.34        3.91       -0.215        -0.294        -0.706        -0.657        -0.725        -0.818        -0.579        -0.398 
6 2020-01-02        3.52        3.45       -0.215        -0.255        -0.706        -0.649        -0.694        -0.818        -0.548        -0.367", header = T) %>% 
  mutate(Date = ymd(Date), Year = year(Date), Month = month(Date)) 

myfuncs <- function(x) list(sum(x[x < 0]), sum(x[x > 0]))

setDT(data) %>% 
  .[, lapply(.SD, myfuncs), .SDcol=-1, .(Year, Month)] %>% 
  .[,y := rep(c("MonthFDD", "MonthTDD"), .N/2)] %>% 
  dcast(Year Month ~ y, value.var = names(.)[-c(1:2, ncol(.))])

#>    Year Month DayMean_2cm_MonthFDD DayMean_2cm_MonthTDD DayMean_5cm_MonthFDD
#> 1: 2019    12                    0                34.62                    0
#> 2: 2020     1                    0                 7.86                    0
#>    DayMean_5cm_MonthTDD DayMean_80cm_MonthFDD DayMean_80cm_MonthTDD
#> 1:                33.67                 -0.43                 28.26
#> 2:                 7.36                 -0.43                     0
#>    DayMean_120cm_MonthFDD DayMean_120cm_MonthTDD DayMean_160cm_MonthFDD
#> 1:                 -0.588                   28.5                 -1.412
#> 2:                 -0.549                      0                 -1.412
#>    DayMean_160cm_MonthTDD DayMean_200cm_MonthFDD DayMean_200cm_MonthTDD
#> 1:                  27.97                 -1.314                   28.3
#> 2:                      0                 -1.306                      0
#>    DayMean_250cm_MonthFDD DayMean_250cm_MonthTDD DayMean_300cm_MonthFDD
#> 1:                  -1.45                  28.27                 -1.636
#> 2:                 -1.419                      0                 -1.636
#>    DayMean_300cm_MonthTDD DayMean_350cm_MonthFDD DayMean_350cm_MonthTDD
#> 1:                  28.04                 -1.212                  28.62
#> 2:                      0                 -1.127                      0
#>    DayMean_450cm_MonthFDD DayMean_450cm_MonthTDD
#> 1:                 -0.828                  28.77
#> 2:                 -0.765                      0
  • Related