Simple pivot_longer
question. Given:
library(tidyverse)
df <- data.frame(id = c(1),
test_date = as.Date("01/01/1998", "%d/%m/%Y"),
test_result = c("yes"),
exit_date = as.Date("01/02/1998", "%d/%m/%Y"),
dob_date = as.Date("01/01/1997", "%d/%m/%Y"),
exit_result_1 = c(10),
exit_result_2 = c(4),
exit_result_3 = c(2))
df
# id test_date test_result exit_date dob_date exit_result_1 exit_result_2 exit_result_3
# 1 1 1998-01-01 yes 1998-02-01 1997-01-01 10 4 2
I want:
# id test_result exit_result_1 exit_result_2 exit_result_3 date_name dates
# 1 1 yes NA NA NA test_date 1998-01-01
# 2 1 <NA> 10 4 2 exit_date 1998-02-01
# 3 1 <NA> NA NA NA dob_date 1997-01-01
Something along the lines of
pivot_longer(df, cols = matches('date'),
names_to = "date_name", values_to = "dates")
but I want to return NA
s. Any advice? Thanks
CodePudding user response:
You can set ".value"
in names_to
to indicate the corresponding component of the column name that defines the name of the output column. In this case one of names_sep
or names_pattern
must be supplied to specify how the column names should be split.
df %>%
pivot_longer(
-id,
names_to = c("date_name", ".value"),
names_pattern = "([^_] )_(. )"
)
# # A tibble: 3 × 7
# id date_name date result result_1 result_2 result_3
# <dbl> <chr> <date> <chr> <dbl> <dbl> <dbl>
# 1 1 test 1998-01-01 yes NA NA NA
# 2 1 exit 1998-02-01 NA 10 4 2
# 3 1 dob 1997-01-01 NA NA NA NA
For comparison, the original dataset looks like
df
# id test_date test_result exit_date dob_date exit_result_1 exit_result_2 exit_result_3
# 1 1 1998-01-01 yes 1998-02-01 1997-01-01 10 4 2
CodePudding user response:
What you're asking for is outside the intent of pivoting/reshaping operations. The result
pivot_longer(df, cols = matches('date'), names_to = "date_name", values_to = "dates")
# # A tibble: 3 × 7
# id test_result exit_result_1 exit_result_2 exit_result_3 date_name dates
# <dbl> <chr> <dbl> <dbl> <dbl> <chr> <date>
# 1 1 yes 10 4 2 test_date 1998-01-01
# 2 1 yes 10 4 2 exit_date 1998-02-01
# 3 1 yes 10 4 2 dob_date 1997-01-01
is correct in that the non-date
values are carried down for every observation in the original frame.
What you need is to change the variables based on date_name
, I think.
pivot_longer(df, cols = matches('date'),
names_to = "date_name", values_to = "dates") %>%
mutate(
test_result = if_else(date_name == "test_date", test_result, test_result[NA]),
across(starts_with("exit"),
~ if_else(date_name == "exit_date", ., .[NA]))
)
# # A tibble: 3 × 7
# id test_result exit_result_1 exit_result_2 exit_result_3 date_name dates
# <dbl> <chr> <dbl> <dbl> <dbl> <chr> <date>
# 1 1 yes NA NA NA test_date 1998-01-01
# 2 1 NA 10 4 2 exit_date 1998-02-01
# 3 1 NA NA NA NA dob_date 1997-01-01
I use var[NA]
instead of NA_character_
to get the correct class of NA
values, since otherwise if_else
complains
! `false` must be a character vector, not a logical vector.