Hello I need your support to join rows with the same name together and remove NA. In case of columns with the same name, a new column is created with a subscript, or combine it together with a comma.
I have this example dataframe:
name<-c("John","John","John","Luis","Luis")
may<-c("a",NA,NA,"a",NA)
june<-c(NA,"b",NA,NA,"a")
july<-c("d",NA,"c",NA,NA)
df<-data.frame(name,may,june,july)
having the following dataframe:
name may june july
1 John a <NA> d
2 John <NA> b <NA>
3 John <NA> <NA> c
4 Luis a <NA> <NA>
5 Luis <NA> a <NA>
I expect a result like the following:
name may june july july.2
1 John a b c d
2 Luis a a <NA> <NA>
or like the following:
name may june july
1 John a b c,d
2 Luis a a <NA>
CodePudding user response:
We can use summarize
to concatenate strings together under the same "name".
In summarize()
, if all records in the same column are NA
, we fill that record with NA
. If not, concatenate the strings without NA
.
df %>%
group_by(name) %>%
summarize(across(everything(), ~ifelse(sum(is.na(.x)) == n(), NA, paste0(na.omit(sort(.x)), collapse = ","))))
# A tibble: 2 × 4
name may june july
<chr> <chr> <chr> <chr>
1 John a b c,d
2 Luis a a NA