When concatenating strings using dplyr, group_by & collapse or summarize, NA
values become a string "NA"
. How to avoid it?
See my example below:
ID <- c(1,1,2,3)
string <- c(' asfdas ', 'sdf', NA, 'NA')
df <- data.frame(ID, string)
Both,
df_conca <-df%>%
group_by(ID)%>%
summarize(string = paste(string, collapse = "; "))%>%
distinct_all()
and
df_conca <-df%>%
group_by(ID)%>%
dplyr::mutate(string = paste(string, collapse = "; "))%>%
distinct_all()
result in:
ID string
1 1 " asfdas ; sdf"
2 2 "NA"
3 3 "NA"
, but I would like to keep the NA
values as such:
ID string
1 1 " asfdas ; sdf"
2 2 NA
3 3 "NA"
Ideally, I would like to stay within the dplyr workflow.
CodePudding user response:
We may use str_c
from the stringr
package.
library(dplyr)
library(stringr)
df %>%
group_by(ID)%>%
summarize(string = str_c(string, collapse = "; "))
# ID string
# <dbl> <chr>
#1 1 " asfdas ; sdf"
#2 2 NA
#3 3 "NA"