Home > Net >  R Joining two columns inside a dataframe using values of a third column
R Joining two columns inside a dataframe using values of a third column

Time:03-15

I have a similar dataframe as follows with 2 participants (ID1 and ID2), and their variables var1 and var2, and the moment when these variables where measured (time_weeks_var1 and 2):

df <- data.frame (ID = c (1, 1, 1, 1, 2, 2, 2),
                  time_weeks_var1 = c (10, 12, 14, 17, 5, 9, 13),
                  var1 = c (14, 143, 190, 402, 16, 55, 75),
                  time_weeks_var2 = c(2,8,12,13,5,7,19),
                  var2 = c(154, NA, 142, 132, 54, 58, 39))

  ID time_weeks_var1 var1 time_weeks_var2 var2
1  1              10   14               2  154
2  1              12  143               8   NA
3  1              14  190              12  142
4  1              17  402              13  132
5  2               5   16               5   54
6  2               9   55               7   58
7  2              13   75              19   39

I need to obtain the following df, by joining the columns time_weeks while keeping the grouping by ID and the var1 and var2 at the appropriate rows.

   ID time_weeks var1 var2
1   1          2   NA  154
2   1          8   NA   NA
3   1         10   14   NA
4   1         12  143  142
5   1         13   NA  132
6   1         14  190   NA
7   1         17  402   NA
8   2          5   16   54
9   2          7   NA   NA
10  2          9   55   58
11  2         13   75   NA
12  2         19   NA   39

How shall I proceed?

CodePudding user response:

A tidyverse way:

df %>%
  pivot_longer(cols = starts_with("time_weeks"), names_prefix = "time_weeks_", values_to = "time_weeks") %>% 
  mutate(var1 = ifelse(name == "var1", var1, NA), 
         var2 = ifelse(name == "var2", var2, NA)) %>% 
  select(ID, time_weeks, var1, var2) %>%
  group_by(ID, time_weeks) %>%
  summarise(across(c(var1, var2), ~ .[which.min(is.na(.))])) %>% 
  arrange(ID, time_weeks)

# A tibble: 12 x 4
# Groups:   ID [2]
      ID time_weeks  var1  var2
   <dbl>      <dbl> <dbl> <dbl>
 1     1          2    NA   154
 2     1          8    NA    NA
 3     1         10    14    NA
 4     1         12   143   142
 5     1         13    NA   132
 6     1         14   190    NA
 7     1         17   402    NA
 8     2          5    16    54
 9     2          7    NA    58
10     2          9    55    NA
11     2         13    75    NA
12     2         19    NA    39

CodePudding user response:

We may use pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df %>%
   rename_with(~ str_c(.x, "_", .x), starts_with("var")) %>% 
   pivot_longer(cols = -ID, names_to = c(".value", "grp"), 
      names_pattern = "(.*)_(var.*)") %>% 
   select(-grp)

-output

# A tibble: 14 × 4
      ID time_weeks  var1  var2
   <dbl>      <dbl> <dbl> <dbl>
 1     1         10    14    NA
 2     1          2    NA   154
 3     1         12   143    NA
 4     1          8    NA    NA
 5     1         14   190    NA
 6     1         12    NA   142
 7     1         17   402    NA
 8     1         13    NA   132
 9     2          5    16    NA
10     2          5    NA    54
11     2          9    55    NA
12     2          7    NA    58
13     2         13    75    NA
14     2         19    NA    39
  • Related