Home > Back-end >  join rows with same name by removing NA in r
join rows with same name by removing NA in r

Time:07-28

Hello I need your support to join rows with the same name together and remove NA. In case of columns with the same name, a new column is created with a subscript, or combine it together with a comma.

I have this example dataframe:

name<-c("John","John","John","Luis","Luis")
may<-c("a",NA,NA,"a",NA)
june<-c(NA,"b",NA,NA,"a")
july<-c("d",NA,"c",NA,NA)
df<-data.frame(name,may,june,july)

having the following dataframe:

  name  may june july
1 John    a <NA>    d
2 John <NA>    b <NA>
3 John <NA> <NA>    c
4 Luis    a <NA> <NA>
5 Luis <NA>    a <NA>

I expect a result like the following:

  name may  june  july  july.2
1 John   a    b    c      d
2 Luis   a    a   <NA>   <NA>

or like the following:

  name  may june  july
1 John   a    b   c,d
2 Luis   a    a   <NA>

CodePudding user response:

We can use summarize to concatenate strings together under the same "name".

In summarize(), if all records in the same column are NA, we fill that record with NA. If not, concatenate the strings without NA.

df %>% 
  group_by(name) %>% 
  summarize(across(everything(), ~ifelse(sum(is.na(.x)) == n(), NA, paste0(na.omit(sort(.x)), collapse = ","))))

# A tibble: 2 × 4
  name  may   june  july 
  <chr> <chr> <chr> <chr>
1 John  a     b     c,d  
2 Luis  a     a     NA  
  •  Tags:  
  • r
  • Related