Home > front end >  Concatenating strings / rows using dplyr, group_by & collapse or summarize, but maintain NA values [
Concatenating strings / rows using dplyr, group_by & collapse or summarize, but maintain NA values [

Time:09-24

When concatenating strings using dplyr, group_by & collapse or summarize, NA values become a string "NA". How to avoid it?

See my example below:

ID <- c(1,1,2,3)
string <- c(' asfdas ', 'sdf', NA, 'NA')
df <- data.frame(ID, string)

Both,

df_conca <-df%>%
 group_by(ID)%>%
 summarize(string = paste(string, collapse = "; "))%>%
 distinct_all()

and

df_conca <-df%>%
 group_by(ID)%>%
 dplyr::mutate(string = paste(string, collapse = "; "))%>%
 distinct_all()

result in:

     ID string               
1     1 " asfdas ; sdf"
2     2 "NA"           
3     3 "NA" 

, but I would like to keep the NA values as such:

     ID string             
1     1 " asfdas ; sdf"
2     2 NA           
3     3 "NA" 

Ideally, I would like to stay within the dplyr workflow.

CodePudding user response:

We may use str_c from the stringr package.

library(dplyr)
library(stringr)

df %>%
  group_by(ID)%>%
  summarize(string = str_c(string, collapse = "; "))

#     ID string         
#  <dbl> <chr>          
#1     1 " asfdas ; sdf"
#2     2  NA            
#3     3 "NA"           
  • Related