I have a dataframe in R
that looks like this:
genus | species |
---|---|
Vulgatibacter | Vulgatibacter sp. |
NA | Planctomyces |
Holophaga | Geothrix sp. |
And I want to delete values from column species if the value from genus does not contain in species. I want to have this:
genus | species |
---|---|
Vulgatibacter | Vulgatibacter sp. |
NA | Planctomyces |
Holophaga | NA |
transform(.,Species= ifelse(Genus %in% Species, Species, NA))
does not work.
CodePudding user response:
Not sure I understand the logic, but this works for your example data:
df <- structure(list(genus = c("Vulgatibacter",
NA, "Holophaga"),
species = c("Vulgatibacter sp.",
"Planctomyces",
"Geothrix sp.")),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L))
transform(df, species = ifelse(is.na(genus), species,
ifelse(sub(" sp.", "", species) == genus, species, NA)))
#> genus species
#> 1 Vulgatibacter Vulgatibacter sp.
#> 2 <NA> Planctomyces
#> 3 Holophaga <NA>
Created on 2022-03-23 by the reprex package (v2.0.1)
CodePudding user response:
df |>
mutate(species = case_when(mapply(grepl, sprintf("^%s", genus), species) ~ species,
is.na(genus) ~ species,
TRUE ~ NA_character_))
#> # A tibble: 3 × 2
#> genus species
#> <chr> <chr>
#> 1 Vulgatibacter Vulgatibacter sp.
#> 2 NA Planctomyces
#> 3 Holophaga NA