Home > Enterprise >  R dataframe has date as part of a column name and I want specify the most recent two columns without
R dataframe has date as part of a column name and I want specify the most recent two columns without

Time:09-30

I am trying to automate a report that I am running manually each week. I have a data frame called p_wide where each week a new column is being added and column names contain dates that represent when each of those columns are added, for example:

id   col_2022_09_04   col_2022_09_11   col_2022_09_18   col_2022_09_25
---  --------------   --------------   --------------   --------------
01        0.3               0.8              0.9              0.1
02        0.6               0.1              0.4              0.5
03        0.2               0.1              0.3              0.4
04        0.1               0.7              0.4              0.9

In my report, I create a subset of p_wide dataframe using a filter. For example:

p_mover <- p_wide %>% filter(abs(col_2022_09_18 - col_2022_09_25) > .33)

Is there a way that I can specify those two columns above without manually typing each week?

CodePudding user response:

If the column names are always of the form col_yyyy_mm_dd then here's some code that extracts the dates, finds the last and last but one and uses these with !! to do the filtering. The dates are assumed to be in order with the most recent as the last one.

library(stringr)
library(dplyr)

p_wide <- read.table(text="id   col_2022_09_04   col_2022_09_11   col_2022_09_18   col_2022_09_25
01        0.3               0.8              0.9              0.1
02        0.6               0.1              0.4              0.5
03        0.2               0.1              0.3              0.4
04        0.1               0.7              0.4              0.9", header = T)

# Get the names of the columns
column_names <- names(p_wide)
# Create a regex to look for dates of the form yyyy_mm_dd
dates_regex <- regex('[0-9]{4}_[0-9]{2}_[0-9]{2}')
# Find the dates
dates <- str_match(string = column_names, dates_regex)[,1] %>% na.omit() 
# Find the last date and make a symbol for use later
last_date_column <- sym(paste0('col_', max(dates)))
# Find the last but one date - again as a symbol
last_date_but_one_column <- sym(paste0('col_', head(tail(dates, 2), 1)))

# Filter using the calculated variables
p_wide %>% filter(abs(!!last_date_but_one_column - !!last_date_column) > .33)

# id col_2022_09_04 col_2022_09_11 col_2022_09_18 col_2022_09_25
# 1  1            0.3            0.8            0.9            0.1
# 2  4            0.1            0.7            0.4            0.9
  • Related