I have df like this
ID <- c("A01","B20","C3","D4")
Nb_data <- c(2,2,2,3)
Weight_t1 <- c(70,44,98,65)
Weight_t2 <- c(75,78,105,68)
Weight_t3 <- c(72,52,107,NA)
year1 <- c(20,28,32,50)
year2 <- c(28,32,35,60)
year3 <- c(29,35,38,NA)
LENGTHt1 <- c(175,155,198,165)
LENGTHt2 <- c(175,155,198,163)
LENGTHt3 <- c(176,154,198,NA)
df <- data.frame(ID,Nb_data,Weight_t1,Weight_t2,Weight_t3,year1,year2,year3,LENGTHt1,LENGTHt2,LENGTHt3)
weight/year and length : t1 to t28
I want to tidy my data like :
ID | Nb_data | Weigth | Year | Length |
---|---|---|---|---|
A01 | 3 | 70 | 20 | 175 |
A01 | 3 | 75 | 28 | 175 |
A01 | 3 | 72 | 29 | 176 |
B20 | 3 | 44 | 28 | 155 |
B20 | 3 | 78 | 32 | 155 |
B20 | 3 | 52 | 35 | 154 |
I try
df1 <- df %>%
pivot_longer(cols = -c('ID','Nb_data'),
names_to = c('Weight','Year','Length' ),
names_pattern = '(Weight_t[0-9]*|year[0-9]*|LENGTHt[0-9]*)' ,
values_drop_na = TRUE)
or names_pattern = '(.t[0-9])(.t[0-9])(.t[0-9])'
I have some difficulties to use regex or maybe pivot_longer are not suitable...
CodePudding user response:
You need to extract the common timepoint information from the variable names. Make this information consistent first, with a clear separator (_
in this case), then it becomes much easier.
I would do something like this
library(tidyr)
library(dplyr)
df1 <- df
names(df1) <- gsub("year", "Year_t", names(df1))
names(df1) <- gsub("LENGTH", "Length_", names(df1))
df1 %>%
pivot_longer(cols = -c('ID','Nb_data'),
names_to = c("name", "timepoint"),
names_sep = "_",
values_drop_na = TRUE) %>%
pivot_wider(names_from = name, values_from = value)
Output:
ID Nb_data timepoint Weight Year Length
<chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 A01 2 t1 70 20 175
2 A01 2 t2 75 28 175
3 A01 2 t3 72 29 176
4 B20 2 t1 44 28 155
5 B20 2 t2 78 32 155
6 B20 2 t3 52 35 154
7 C3 2 t1 98 32 198
8 C3 2 t2 105 35 198
9 C3 2 t3 107 38 198
10 D4 3 t1 65 50 165
11 D4 3 t2 68 60 163
CodePudding user response:
You could directly use pivot_longer
though with abit of complex regex as follows
df %>%
pivot_longer(matches("\\d $"), names_to = c(".value", "grp"),
names_pattern = "(.*?)[_t]{0,2}(\\d $)",
values_drop_na = TRUE)
# A tibble: 11 × 6
ID Nb_data grp Weight year LENGTH
<chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 A01 2 1 70 20 175
2 A01 2 2 75 28 175
3 A01 2 3 72 29 176
4 B20 2 1 44 28 155
5 B20 2 2 78 32 155
6 B20 2 3 52 35 154
7 C3 2 1 98 32 198
8 C3 2 2 105 35 198
9 C3 2 3 107 38 198
10 D4 3 1 65 50 165
11 D4 3 2 68 60 163