Home > Mobile >  R delete value from column if it does not contain itself in value from another column
R delete value from column if it does not contain itself in value from another column

Time:03-23

I have a dataframe in R that looks like this:

genus species
Vulgatibacter Vulgatibacter sp.
NA Planctomyces
Holophaga Geothrix sp.

And I want to delete values from column species if the value from genus does not contain in species. I want to have this:

genus species
Vulgatibacter Vulgatibacter sp.
NA Planctomyces
Holophaga NA

transform(.,Species= ifelse(Genus %in% Species, Species, NA)) does not work.

CodePudding user response:

Not sure I understand the logic, but this works for your example data:

df <- structure(list(genus = c("Vulgatibacter", 
                               NA, "Holophaga"),
                     species = c("Vulgatibacter sp.", 
                                 "Planctomyces", 
                                 "Geothrix sp.")),
                class = c("tbl_df", "tbl", "data.frame"),
                row.names = c(NA, -3L))

transform(df, species = ifelse(is.na(genus), species,
                               ifelse(sub(" sp.", "", species) == genus, species, NA)))
#>           genus           species
#> 1 Vulgatibacter Vulgatibacter sp.
#> 2          <NA>      Planctomyces
#> 3     Holophaga              <NA>

Created on 2022-03-23 by the reprex package (v2.0.1)

CodePudding user response:

df |>
  mutate(species = case_when(mapply(grepl, sprintf("^%s", genus), species) ~ species,
                             is.na(genus) ~ species,
                             TRUE ~ NA_character_))

#> # A tibble: 3 × 2
#>   genus         species          
#>   <chr>         <chr>            
#> 1 Vulgatibacter Vulgatibacter sp.
#> 2 NA            Planctomyces     
#> 3 Holophaga     NA

  • Related