Home > Blockchain >  How to use regex with names_pattern from pivot_longer
How to use regex with names_pattern from pivot_longer

Time:09-30

I have df like this

ID <- c("A01","B20","C3","D4")
Nb_data <- c(2,2,2,3)
Weight_t1 <- c(70,44,98,65)
Weight_t2 <- c(75,78,105,68)
Weight_t3 <- c(72,52,107,NA)
year1 <- c(20,28,32,50)
year2 <- c(28,32,35,60)
year3 <- c(29,35,38,NA)
LENGTHt1 <- c(175,155,198,165)
LENGTHt2 <- c(175,155,198,163)
LENGTHt3 <- c(176,154,198,NA)
df <- data.frame(ID,Nb_data,Weight_t1,Weight_t2,Weight_t3,year1,year2,year3,LENGTHt1,LENGTHt2,LENGTHt3)

weight/year and length : t1 to t28

I want to tidy my data like :

ID Nb_data Weigth Year Length
A01 3 70 20 175
A01 3 75 28 175
A01 3 72 29 176
B20 3 44 28 155
B20 3 78 32 155
B20 3 52 35 154

I try

df1 <- df %>% 
  pivot_longer(cols = -c('ID','Nb_data'), 
               names_to = c('Weight','Year','Length' ), 
               names_pattern = '(Weight_t[0-9]*|year[0-9]*|LENGTHt[0-9]*)' ,
               values_drop_na = TRUE) 

or names_pattern = '(.t[0-9])(.t[0-9])(.t[0-9])'

I have some difficulties to use regex or maybe pivot_longer are not suitable...

CodePudding user response:

You need to extract the common timepoint information from the variable names. Make this information consistent first, with a clear separator (_ in this case), then it becomes much easier.

I would do something like this

library(tidyr)
library(dplyr)
df1 <- df
names(df1) <- gsub("year", "Year_t", names(df1))
names(df1) <- gsub("LENGTH", "Length_", names(df1))
df1 %>% 
  pivot_longer(cols = -c('ID','Nb_data'),
               names_to = c("name", "timepoint"), 
               names_sep = "_",
               values_drop_na = TRUE) %>% 
  pivot_wider(names_from = name, values_from = value)

Output:

   ID    Nb_data timepoint Weight  Year Length
   <chr>   <dbl> <chr>      <dbl> <dbl>  <dbl>
 1 A01         2 t1            70    20    175
 2 A01         2 t2            75    28    175
 3 A01         2 t3            72    29    176
 4 B20         2 t1            44    28    155
 5 B20         2 t2            78    32    155
 6 B20         2 t3            52    35    154
 7 C3          2 t1            98    32    198
 8 C3          2 t2           105    35    198
 9 C3          2 t3           107    38    198
10 D4          3 t1            65    50    165
11 D4          3 t2            68    60    163

CodePudding user response:

You could directly use pivot_longer though with abit of complex regex as follows

df %>%
   pivot_longer(matches("\\d $"), names_to = c(".value", "grp"),
                names_pattern = "(.*?)[_t]{0,2}(\\d $)",
                values_drop_na = TRUE)

# A tibble: 11 × 6
   ID    Nb_data grp   Weight  year LENGTH
   <chr>   <dbl> <chr>  <dbl> <dbl>  <dbl>
 1 A01         2 1         70    20    175
 2 A01         2 2         75    28    175
 3 A01         2 3         72    29    176
 4 B20         2 1         44    28    155
 5 B20         2 2         78    32    155
 6 B20         2 3         52    35    154
 7 C3          2 1         98    32    198
 8 C3          2 2        105    35    198
 9 C3          2 3        107    38    198
10 D4          3 1         65    50    165
11 D4          3 2         68    60    163
  • Related