I have a dataset consisting on a first column with dates and many other columns mean daily temperature at different sites (depths in this case). Looks like this:
Date DayMean_2cm DayMean_5cm DayMean_80cm DayMean_120cm DayMean_160cm DayMean_200cm DayMean_250cm DayMean_300cm DayMean_350cm DayMean_450cm
1 2019-12-28 19.9 19.9 20.0 20.2 20.0 20.2 20.2 20.1 20.5 20.6
2 2019-12-29 8.76 8.68 8.26 8.30 7.97 8.10 8.07 7.94 8.12 8.17
3 2019-12-30 2.20 1.78 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.61 -0.414
4 2019-12-31 3.76 3.31 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.602 -0.414
5 2020-01-01 4.34 3.91 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.579 -0.398
6 2020-01-02 3.52 3.45 -0.215 -0.255 -0.706 -0.649 -0.694 -0.818 -0.548 -0.367
What i need to calculate using R is the monthly sumation of the positive temperatures, and the negative temperatures... for each column.
I know how to do it column by column using this piece of code:
MonthIndexes <- DayStats2 %>%
group_by(Year, Month) %>%
summarize(MonthFDD = sum(DayMean[which(DayMean<0)]), MonthTDD = sum(DayMean[which(DayMean>0)]))
it works nicely, but i would like to do it for all the columns at once, independently of the number of columns contained in the dataset. If fact i do other calculations in that fashion with this piece of code:
MonthStats <- data %>%
group_by(Year, Month) %>%
summarise(across(
.cols = is.numeric,
.fns = list(MonthMean=mean, MonthMax=max, MonthMin=min, MonthSD=sd), na.rm = TRUE,
.names = "{fn}_{col}"
))
However, my problems it that i do not know how to adapt the last piece of code to calculate the conditional summation. I tried this without good results:
MonthIndexes <- DayStats2 %>%
group_by(Year, Month) %>%
summarise(across(
.cols = is.numeric,
.fns = list(MonthFDD = sum({col}[which({col}<0)]), MonthTDD=sum({col}[which({col}>0)])), na.rm = TRUE,
.names = "{fn}_{col}"
))
The error returned in RStudio is
Error: Problem with
summarise()
input..1
. i..1 = across(...)
. x comparison (3) is possible only for atomic and list types
Any help will be appreciated. Thanks in advance.
CodePudding user response:
You're very close. The {.col}
syntax is used for labels only, not function arguments. Instead you can use purrr-style anonymous functions using ~
to begin the function and .x
to refer to the argument, e.g., MonthFDD = ~sum(.x[which(.x < 0)])
See more info and examples on the ?across
help page.
data %>%
group_by(Year, Month) %>%
summarise(across(
.cols = where(is.numeric),
.fns = list(MonthFDD = ~sum(.x[which(.x<0)], na.rm = T), MonthTDD= ~sum(.x[which(.x>0)], na.rm = TRUE)),
.names = "{fn}_{col}"
))
# # A tibble: 2 × 22
# # Groups: Year [2]
# Year Month MonthFDD_DayMean_2cm MonthTDD_DayMean_2cm MonthFDD_DayMean_5cm MonthTDD_DayMean_5cm MonthFDD_DayMea…
# <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2019 12 0 34.6 0 33.7 -0.43
# 2 2020 1 0 7.86 0 7.36 -0.43
# # … with 15 more variables: MonthTDD_DayMean_80cm <dbl>, MonthFDD_DayMean_120cm <dbl>,
# # MonthTDD_DayMean_120cm <dbl>, MonthFDD_DayMean_160cm <dbl>, MonthTDD_DayMean_160cm <dbl>,
# # MonthFDD_DayMean_200cm <dbl>, MonthTDD_DayMean_200cm <dbl>, MonthFDD_DayMean_250cm <dbl>,
# # MonthTDD_DayMean_250cm <dbl>, MonthFDD_DayMean_300cm <dbl>, MonthTDD_DayMean_300cm <dbl>,
# # MonthFDD_DayMean_350cm <dbl>, MonthTDD_DayMean_350cm <dbl>, MonthFDD_DayMean_450cm <dbl>,
# # MonthTDD_DayMean_450cm <dbl>
Using this input:
data = read.table(text = "Date DayMean_2cm DayMean_5cm DayMean_80cm DayMean_120cm DayMean_160cm DayMean_200cm DayMean_250cm DayMean_300cm DayMean_350cm DayMean_450cm
1 2019-12-28 19.9 19.9 20.0 20.2 20.0 20.2 20.2 20.1 20.5 20.6
2 2019-12-29 8.76 8.68 8.26 8.30 7.97 8.10 8.07 7.94 8.12 8.17
3 2019-12-30 2.20 1.78 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.61 -0.414
4 2019-12-31 3.76 3.31 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.602 -0.414
5 2020-01-01 4.34 3.91 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.579 -0.398
6 2020-01-02 3.52 3.45 -0.215 -0.255 -0.706 -0.649 -0.694 -0.818 -0.548 -0.367", header = T) %>%
mutate(Date = ymd(Date), Year = year(Date), Month = month(Date))
CodePudding user response:
A solution based on data.table
:
library(data.table)
library(magrittr)
library(dplyr)
library(lubridate)
data = read.table(text = "Date DayMean_2cm DayMean_5cm DayMean_80cm DayMean_120cm DayMean_160cm DayMean_200cm DayMean_250cm DayMean_300cm DayMean_350cm DayMean_450cm
1 2019-12-28 19.9 19.9 20.0 20.2 20.0 20.2 20.2 20.1 20.5 20.6
2 2019-12-29 8.76 8.68 8.26 8.30 7.97 8.10 8.07 7.94 8.12 8.17
3 2019-12-30 2.20 1.78 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.61 -0.414
4 2019-12-31 3.76 3.31 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.602 -0.414
5 2020-01-01 4.34 3.91 -0.215 -0.294 -0.706 -0.657 -0.725 -0.818 -0.579 -0.398
6 2020-01-02 3.52 3.45 -0.215 -0.255 -0.706 -0.649 -0.694 -0.818 -0.548 -0.367", header = T) %>%
mutate(Date = ymd(Date), Year = year(Date), Month = month(Date))
myfuncs <- function(x) list(sum(x[x < 0]), sum(x[x > 0]))
setDT(data) %>%
.[, lapply(.SD, myfuncs), .SDcol=-1, .(Year, Month)] %>%
.[,y := rep(c("MonthFDD", "MonthTDD"), .N/2)] %>%
dcast(Year Month ~ y, value.var = names(.)[-c(1:2, ncol(.))])
#> Year Month DayMean_2cm_MonthFDD DayMean_2cm_MonthTDD DayMean_5cm_MonthFDD
#> 1: 2019 12 0 34.62 0
#> 2: 2020 1 0 7.86 0
#> DayMean_5cm_MonthTDD DayMean_80cm_MonthFDD DayMean_80cm_MonthTDD
#> 1: 33.67 -0.43 28.26
#> 2: 7.36 -0.43 0
#> DayMean_120cm_MonthFDD DayMean_120cm_MonthTDD DayMean_160cm_MonthFDD
#> 1: -0.588 28.5 -1.412
#> 2: -0.549 0 -1.412
#> DayMean_160cm_MonthTDD DayMean_200cm_MonthFDD DayMean_200cm_MonthTDD
#> 1: 27.97 -1.314 28.3
#> 2: 0 -1.306 0
#> DayMean_250cm_MonthFDD DayMean_250cm_MonthTDD DayMean_300cm_MonthFDD
#> 1: -1.45 28.27 -1.636
#> 2: -1.419 0 -1.636
#> DayMean_300cm_MonthTDD DayMean_350cm_MonthFDD DayMean_350cm_MonthTDD
#> 1: 28.04 -1.212 28.62
#> 2: 0 -1.127 0
#> DayMean_450cm_MonthFDD DayMean_450cm_MonthTDD
#> 1: -0.828 28.77
#> 2: -0.765 0