I googled and I found out the solution for Python and SQL but not for R coding.
I attach an example of a dataframe called df1 in order to be easy to understand.
Genus Species Genusspecie
Escherichia coli Escherichia coli
Campylobacter NA NA
Shigella sonnei Shigella sonnei
If exists NA in df1 is only in the variable column of Specie.
Then I desire that if NA exists in Specie the complete species name (new variable created called Genusspecie) appears NA. If Genus and Specie are both informed, I desire to obtain the completne specie name.
I tried the command paste but then I will need to transform the string cells of dataframe containing NA to a string cell only containing NA without genus information.
df1$Genusspecie <- paste(taxa2$Genus, taxa2$Species)
Thanks on advance for your help,
CodePudding user response:
You can use ifelse
. If Species
is NA
, return NA
, otherwise paste
the two columns together.
within(df1, GenusSpecies <- ifelse(is.na(Species), NA, paste(Genus, Species)))
#> Genus Species GenusSpecies
#> 1 Escherichia coli Escherichia coli
#> 2 Campylobacter <NA> <NA>
#> 3 Shigella sonnei Shigella sonnei
Data from question in reproducible format
df1 <- data.frame(Genus = c("Escherichia", "Campylobacter", "Shigella"),
Species = c("coli", NA, "sonnei"))
CodePudding user response:
This is a perfect use case for str_c
from stringr
package:
Different to paste
str_c
returns a NA
if there is one NA
:
library(dplyr)
library(stringr)
df1 %>%
mutate(GenuSpecies = str_c(Genus, Species, sep = " "))
Genus Species GenuSpecies
1 Escherichia coli Escherichia coli
2 Campylobacter <NA> <NA>
3 Shigella sonnei Shigella sonnei