Home > Mobile >  Concatenate two columns only if both contains string value, not NA value, in R
Concatenate two columns only if both contains string value, not NA value, in R

Time:11-17

I googled and I found out the solution for Python and SQL but not for R coding.

I attach an example of a dataframe called df1 in order to be easy to understand.

Genus          Species       Genusspecie
Escherichia     coli       Escherichia coli
Campylobacter    NA        NA
Shigella        sonnei     Shigella sonnei

If exists NA in df1 is only in the variable column of Specie.

Then I desire that if NA exists in Specie the complete species name (new variable created called Genusspecie) appears NA. If Genus and Specie are both informed, I desire to obtain the completne specie name.

I tried the command paste but then I will need to transform the string cells of dataframe containing NA to a string cell only containing NA without genus information.


df1$Genusspecie <- paste(taxa2$Genus, taxa2$Species)

Thanks on advance for your help,

CodePudding user response:

You can use ifelse. If Species is NA, return NA, otherwise paste the two columns together.

within(df1, GenusSpecies <- ifelse(is.na(Species), NA, paste(Genus, Species)))
#>           Genus Species     GenusSpecies
#> 1   Escherichia    coli Escherichia coli
#> 2 Campylobacter    <NA>             <NA>
#> 3      Shigella  sonnei  Shigella sonnei

Data from question in reproducible format

df1 <- data.frame(Genus   = c("Escherichia", "Campylobacter", "Shigella"), 
                  Species = c("coli", NA, "sonnei"))

CodePudding user response:

This is a perfect use case for str_c from stringr package: Different to paste str_c returns a NA if there is one NA:

library(dplyr)
library(stringr)
df1 %>% 
  mutate(GenuSpecies = str_c(Genus, Species, sep = " "))
         Genus Species      GenuSpecies
1   Escherichia    coli Escherichia coli
2 Campylobacter    <NA>             <NA>
3      Shigella  sonnei  Shigella sonnei
  •  Tags:  
  • r
  • Related