Home > Software engineering >  How to make R loop that concatenates multiple times across columns based on a condition
How to make R loop that concatenates multiple times across columns based on a condition

Time:09-01

I have a dataframe similar to the following:

    df <- data.frame(name = c("john", "sara", "bill"),
                 join_0_year = c(1990,1991,1992),
                 join_0_month = c(1,2,3),
                 join_0_day = c(20,21,22),
                 join_0_team = c("tigers", "lions", "bears"),
                 join_1_year = c(2000, 2001, 2002),
                 join_1_month = c(8,9,10),
                 join_1_day = c(14,15,16),
                 join_1_team = c("pirates", "colts", "panthers"))
df

  name join_0_year join_0_month join_0_day join_0_team join_1_year join_1_month join_1_day join_1_team
1 john        1990            1         20      tigers        2000            8         14     pirates
2 sara        1991            2         21       lions        2001            9         15       colts
3 bill        1992            3         22       bears        2002           10         16    panthers

My goal is to concatenate the date columns into a single column. My df has hundreds of columns ("join_0" through "join_400") so manually calling columns by name is out of the question. I imagine I need to make some kind of for-loop, which I struggle with. My desired outcome is:

  name join_0_date join_0_team join_1_date join_1_team
1 john   1990-1-20      tigers   2000-8-14     pirates
2 sara   1991-2-21       lions   2001-9-15       colts
3 bill   1992-3-22       bears  2002-10-16    panthers

Any help is greatly appreciated.

CodePudding user response:

Making the sets of variables and then looping over and concatenating:

s <- rep(0:1,each=3)
vars <- split(sprintf("join_%d_%s", s, c("year","month","day")), s)
df[sprintf("join_%d_date", 0:1)] <- lapply(vars, 
    \(x) as.Date(do.call(paste, c(df[x], sep="-"))) 
)
df[unlist(vars)] <- NULL

#  name join_0_team join_1_team join_0_date join_1_date
#1 john      tigers     pirates  1990-01-20  2000-08-14
#2 sara       lions       colts  1991-02-21  2001-09-15
#3 bill       bears    panthers  1992-03-22  2002-10-16

CodePudding user response:

Here is a tidyverse option:

library(tidyverse)
df %>%
    pivot_longer(matches("year|month|day"),  names_to = c("var", "quant"), names_pattern = "(join_\\d)_(. )") %>%
    group_by(across(-c(quant, value))) %>%
    summarise(value = str_c(value, collapse = "-"), .groups = "drop") %>%
    pivot_wider(names_from = var, names_glue = "{var}_date")
## A tibble: 3 × 5
#  name  join_0_team join_1_team join_0_date join_1_date
#  <chr> <chr>       <chr>       <chr>       <chr>      
#1 bill  bears       panthers    1992-3-22   2002-10-16 
#2 john  tigers      pirates     1990-1-20   2000-8-14  
#3 sara  lions       colts       1991-2-21   2001-9-15  

Explanation: The idea is to reshape date-related data from wide to long, then combine year/month/day into a date, and then reshape again from long to wide.

CodePudding user response:

This works but it is much worse than the other answers. Just found the task fun.

library(tidyverse) 

df %>% {lapply(seq(sum(str_detect(string = names(df), pattern = "day"))), 
                   function(y) y %>% {mutate(df, y=apply(df[(2 ((.-1)*4)):(.*4)], 1, paste0, collapse = '-'))})} %>% 
      lapply(function(x) cbind(x$y)) %>% 
      cbind(df, .) %>% 
      dplyr::select(-which(str_detect(string = names(.), pattern = "year|month|day"))) %>% 
      setNames(ifelse(str_detect(names(.), pattern = "join_|name"), names(.), 
                      paste0("join_", seq(str_detect(names(.), pattern = "join_"))-length(.) 1, "_date")))

  name join_0_team join_1_team join_0_date join_1_date
1 john      tigers     pirates   1990-1-20   2000-8-14
2 sara       lions       colts   1991-2-21   2001-9-15
3 bill       bears    panthers   1992-3-22  2002-10-16
  • Related