I have a dataframe similar to the following:
df <- data.frame(name = c("john", "sara", "bill"),
join_0_year = c(1990,1991,1992),
join_0_month = c(1,2,3),
join_0_day = c(20,21,22),
join_0_team = c("tigers", "lions", "bears"),
join_1_year = c(2000, 2001, 2002),
join_1_month = c(8,9,10),
join_1_day = c(14,15,16),
join_1_team = c("pirates", "colts", "panthers"))
df
name join_0_year join_0_month join_0_day join_0_team join_1_year join_1_month join_1_day join_1_team
1 john 1990 1 20 tigers 2000 8 14 pirates
2 sara 1991 2 21 lions 2001 9 15 colts
3 bill 1992 3 22 bears 2002 10 16 panthers
My goal is to concatenate the date columns into a single column. My df has hundreds of columns ("join_0" through "join_400") so manually calling columns by name is out of the question. I imagine I need to make some kind of for-loop, which I struggle with. My desired outcome is:
name join_0_date join_0_team join_1_date join_1_team
1 john 1990-1-20 tigers 2000-8-14 pirates
2 sara 1991-2-21 lions 2001-9-15 colts
3 bill 1992-3-22 bears 2002-10-16 panthers
Any help is greatly appreciated.
CodePudding user response:
Making the sets of variables and then looping over and concatenating:
s <- rep(0:1,each=3)
vars <- split(sprintf("join_%d_%s", s, c("year","month","day")), s)
df[sprintf("join_%d_date", 0:1)] <- lapply(vars,
\(x) as.Date(do.call(paste, c(df[x], sep="-")))
)
df[unlist(vars)] <- NULL
# name join_0_team join_1_team join_0_date join_1_date
#1 john tigers pirates 1990-01-20 2000-08-14
#2 sara lions colts 1991-02-21 2001-09-15
#3 bill bears panthers 1992-03-22 2002-10-16
CodePudding user response:
Here is a tidyverse
option:
library(tidyverse)
df %>%
pivot_longer(matches("year|month|day"), names_to = c("var", "quant"), names_pattern = "(join_\\d)_(. )") %>%
group_by(across(-c(quant, value))) %>%
summarise(value = str_c(value, collapse = "-"), .groups = "drop") %>%
pivot_wider(names_from = var, names_glue = "{var}_date")
## A tibble: 3 × 5
# name join_0_team join_1_team join_0_date join_1_date
# <chr> <chr> <chr> <chr> <chr>
#1 bill bears panthers 1992-3-22 2002-10-16
#2 john tigers pirates 1990-1-20 2000-8-14
#3 sara lions colts 1991-2-21 2001-9-15
Explanation: The idea is to reshape date-related data from wide to long, then combine year/month/day into a date, and then reshape again from long to wide.
CodePudding user response:
This works but it is much worse than the other answers. Just found the task fun.
library(tidyverse)
df %>% {lapply(seq(sum(str_detect(string = names(df), pattern = "day"))),
function(y) y %>% {mutate(df, y=apply(df[(2 ((.-1)*4)):(.*4)], 1, paste0, collapse = '-'))})} %>%
lapply(function(x) cbind(x$y)) %>%
cbind(df, .) %>%
dplyr::select(-which(str_detect(string = names(.), pattern = "year|month|day"))) %>%
setNames(ifelse(str_detect(names(.), pattern = "join_|name"), names(.),
paste0("join_", seq(str_detect(names(.), pattern = "join_"))-length(.) 1, "_date")))
name join_0_team join_1_team join_0_date join_1_date
1 john tigers pirates 1990-1-20 2000-8-14
2 sara lions colts 1991-2-21 2001-9-15
3 bill bears panthers 1992-3-22 2002-10-16