Home > OS >  pivot_longer and return NAs
pivot_longer and return NAs

Time:01-12

Simple pivot_longer question. Given:

library(tidyverse)
df <- data.frame(id = c(1),
                 test_date = as.Date("01/01/1998", "%d/%m/%Y"),
                 test_result = c("yes"),
                 exit_date = as.Date("01/02/1998", "%d/%m/%Y"),
                 dob_date =  as.Date("01/01/1997", "%d/%m/%Y"),
                 exit_result_1 = c(10),
                 exit_result_2 = c(4),
                 exit_result_3 = c(2))
df
#   id  test_date test_result  exit_date   dob_date exit_result_1 exit_result_2 exit_result_3
# 1  1 1998-01-01         yes 1998-02-01 1997-01-01            10             4             2

I want:

#   id test_result exit_result_1 exit_result_2 exit_result_3 date_name      dates
# 1  1         yes            NA            NA            NA test_date 1998-01-01
# 2  1        <NA>            10             4             2 exit_date 1998-02-01
# 3  1        <NA>            NA            NA            NA  dob_date 1997-01-01

Something along the lines of

pivot_longer(df, cols = matches('date'), 
             names_to = "date_name", values_to = "dates")

but I want to return NAs. Any advice? Thanks

CodePudding user response:

You can set ".value" in names_to to indicate the corresponding component of the column name that defines the name of the output column. In this case one of names_sep or names_pattern must be supplied to specify how the column names should be split.

df %>%
  pivot_longer(
    -id,
    names_to = c("date_name", ".value"),
    names_pattern = "([^_] )_(. )"
  )

# # A tibble: 3 × 7
#      id date_name date       result result_1 result_2 result_3
#   <dbl> <chr>     <date>     <chr>     <dbl>    <dbl>    <dbl>
# 1     1 test      1998-01-01 yes          NA       NA       NA
# 2     1 exit      1998-02-01 NA           10        4        2
# 3     1 dob       1997-01-01 NA           NA       NA       NA

For comparison, the original dataset looks like

df
#   id  test_date test_result  exit_date   dob_date exit_result_1 exit_result_2 exit_result_3
# 1  1 1998-01-01         yes 1998-02-01 1997-01-01            10             4             2

CodePudding user response:

What you're asking for is outside the intent of pivoting/reshaping operations. The result

pivot_longer(df, cols = matches('date'), names_to = "date_name", values_to = "dates")
# # A tibble: 3 × 7
#      id test_result exit_result_1 exit_result_2 exit_result_3 date_name dates     
#   <dbl> <chr>               <dbl>         <dbl>         <dbl> <chr>     <date>    
# 1     1 yes                    10             4             2 test_date 1998-01-01
# 2     1 yes                    10             4             2 exit_date 1998-02-01
# 3     1 yes                    10             4             2 dob_date  1997-01-01

is correct in that the non-date values are carried down for every observation in the original frame.

What you need is to change the variables based on date_name, I think.

pivot_longer(df, cols = matches('date'),
             names_to = "date_name", values_to = "dates") %>%
  mutate(
    test_result = if_else(date_name == "test_date", test_result, test_result[NA]),
    across(starts_with("exit"),
           ~ if_else(date_name == "exit_date", ., .[NA]))
  )
# # A tibble: 3 × 7
#      id test_result exit_result_1 exit_result_2 exit_result_3 date_name dates     
#   <dbl> <chr>               <dbl>         <dbl>         <dbl> <chr>     <date>    
# 1     1 yes                    NA            NA            NA test_date 1998-01-01
# 2     1 NA                     10             4             2 exit_date 1998-02-01
# 3     1 NA                     NA            NA            NA dob_date  1997-01-01

I use var[NA] instead of NA_character_ to get the correct class of NA values, since otherwise if_else complains

! `false` must be a character vector, not a logical vector.
  • Related