Home > other >  how to properly sum rows based in an specific date column rank?
how to properly sum rows based in an specific date column rank?

Time:09-17

The idea is to get the sum based on the column names that are between 01/01/2021 and 01/08/2021:

# define rank parameters {start-end}
first_date <- format(Sys.Date(), "01/01/%Y")

actual_date <- format(Sys.Date() %m-% months(1), "01/%m/%Y")


# get the sum of the rows between first_date and actual_date

df$ytd<- rowSums(df[as.character(seq(first_date,
                                       actual_date))])

However, when applied the next error arises:

Error in seq.default(first_date, to_date) : 'from' must be a finite number

Expected output is a new column that takes the sum of the rows between the specified rank.

data

df <- structure(list(country = c("Mexico", "Mexico", "Mexico", "Mexico"
), `01/01/2021` = c(12, 23, 13, 12), `01/02/2021` = c(12, 23, 
13, 12), `01/03/2021` = c(12, 23, 13, 12), `01/04/2021` = c(12, 
23, 13, 12), `01/05/2021` = c(12, 23, 13, 12), `01/06/2021` = c(12, 
23, 13, 12), `01/07/2021` = c(12, 23, 13, 12), `01/08/2021` = c(12, 
23, 13, 12), `01/09/2021` = c(12, 23, 13, 12), `01/10/2021` = c(12, 
23, 13, 12), `01/11/2021` = c(12, 23, 13, 12), `01/12/2021` = c(12, 
23, 13, 12)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))

How could I properly apply a function to get this output?

CodePudding user response:

The format and seq don't work i.e. seq expects a Date class whereas the format is a character class. Instead, make use of the range operator in across or select

library(dplyr)
out <- df %>% 
    mutate(ytd = rowSums(across(all_of(first_date):all_of(actual_date)))) 

-output

> out$ytd
[1]  96 184 104  96

CodePudding user response:

A base R approach using match -

df$ytd <- rowSums(df[match(first_date, names(df)):match(actual_date, names(df))])
df$ytd
#[1]  96 184 104  96
  • Related