Here a very simple problem that is driving me crazy. I have a data frame with duplicate IDs associated with different names. What I want is to create categorial columns "Names_1", "Names_2", "Names_3"... (actually "Names_3" is enough as each ID has no more than 3 names) in order to remove duplicated IDs.
Here what I have :
ID <- c(1,2,3,3,4,4,5,6,6,6)
Names <- c("a","b","c","d","e","f","j","h","i","j")
df <- data.frame(ID,Names)
ID Names
1 1 a
2 2 b
3 3 c
4 3 d
5 4 e
6 4 f
7 5 j
8 6 h
9 6 i
10 6 j
Here what I want :
ID <- c(1,2,3,4,5,6)
Names_1 <- c("a","b","c","e","j","h")
Names_2 <- c("","","d","f","","i")
Names_3 <- c("","","","","","j")
df <- data.frame(ID,Names_1,Names_2,Names_3)
ID Names_1 Names_2 Names_3
1 1 a
2 2 b
3 3 c d
4 4 e f
5 5 j
6 6 h i j
Thanks ! Valentin
CodePudding user response:
We can use dplyr's group_by(ID)
to create an index, then tidyr's pivot_wider
to pivot with the argument names_from = index
library(dplyr)
library(tidyr)
df %>% group_by(ID)%>%
mutate(index = row_number())%>%
pivot_wider(names_from=index, values_from = Names, names_prefix = 'Names_')
# A tibble: 6 × 4
# Groups: ID [6]
ID Names_1 Names_2 Names_3
<dbl> <chr> <chr> <chr>
1 1 a NA NA
2 2 b NA NA
3 3 c d NA
4 4 e f NA
5 5 j NA NA
6 6 h i j
If we really want these NAs to be empty characters instead, just add the argument values_fill = ''
to the call to pivot_wider, as in:
df %>% group_by(ID)%>%
mutate(index = row_number())%>%
pivot_wider(names_from=index, values_from = Names, names_prefix = 'Names_', values_fill = '')
# A tibble: 6 × 4
# Groups: ID [6]
ID Names_1 Names_2 Names_3
<dbl> <chr> <chr> <chr>
1 1 a "" ""
2 2 b "" ""
3 3 c "d" ""
4 4 e "f" ""
5 5 j "" ""
6 6 h "i" "j"