Home > Mobile >  Wide to long with multiple columns
Wide to long with multiple columns

Time:12-21

I'm trying to convert my dataset from wide format to long format, but it's not working as intended. My dataset has columns rowid, arrest1, arrest2, ..., arrest10, lien1, lien2, ..., lien10 and looks something like this:

rowid   arrest1   arrest2   ...   lien1     lien2   ...
1       1/1/2008  NA              2/2/2009  NA

I'm trying to get a long dataset where I have a time variable that takes on the values 1-10 and separate variables arrest and lien that contain the dates. I tried the following code, but my time variable takes on values 0-9 and in addition to the arrest and lien variables, there are arrest1 and lien2. There's definitely something wrong with the names_pattern argument.

df_long <- df_wide %>%
  select(rowid, lien1:lien10, arrest1:arrest10) %>%
  pivot_longer(-rowid,
               names_to = c(".value", "time"),
               names_pattern =  "(\\w ).*?(\\d{1,2})")

Here's some sample data:

structure(list(rowid = c(9317L, 31447L, 37939L, 40198L, 19346L
), arrest1 = structure(c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), class = "Date"), arrest2 = structure(c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), class = "Date"), arrest3 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest4 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest5 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest6 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest7 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest8 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest9 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest10 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien1 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien2 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien3 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien4 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien5 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien6 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien7 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien8 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien9 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien10 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

Use the names_sep (with regex lookaround - names_sep = "(?<=\\D)(?=\\d)") or capture as groups in names_pattern (names_pattern = "(\\D )(\\d )" - here we are capturing one or more non-digits (\\D ) as a group ((...)) followed by one or more digits (\\d ) separately that corresponds to the vector passed in names_to i.e. ".value" will the value of the columns that for 'arrest', 'lien' and the "grp" will create the new column with the suffix digits from the column names)

library(tidyr)
pivot_longer(df_wide, cols = -rowid, names_to = c(".value", "grp"), 
     names_pattern = "(\\D )(\\d )")

-output

# A tibble: 50 × 4
   rowid grp   arrest lien  
   <int> <chr> <date> <date>
 1  9317 1     NA     NA    
 2  9317 2     NA     NA    
 3  9317 3     NA     NA    
 4  9317 4     NA     NA    
 5  9317 5     NA     NA    
 6  9317 6     NA     NA    
 7  9317 7     NA     NA    
 8  9317 8     NA     NA    
 9  9317 9     NA     NA    
10  9317 10    NA     NA    
# … with 40 more rows
  • Related