Home > Software engineering >  Match rows and and remove values from a cell if condition is met
Match rows and and remove values from a cell if condition is met

Time:09-27

I have a data.frame such as

data = data.frame(plot = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
                  family = c("Fab", "Fab", "Fab", "Pip", "Fab", "Mel", "Myr", "Myr", "Fab"),
                  species = c("Fab", "Fab", "sp 1", "sp2", "Fab", "sp3", "sp4", "sp5", "sp1"))

What I'm trying to do is, if character names in columns family and species match by row, keep the name on family and add NA to the respective species column cell. I was trying to loop but it doesn't seem like a worthy way to do this since my data is pretty big...

CodePudding user response:

Using base R, you can assign NA to the species column after filtering for your use case:

data <- data.frame(plot = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
                   family = c("Fab", "Fab", "Fab", "Pip", "Fab", "Mel", "Myr", "Myr", "Fab"),
                   species = c("Fab", "Fab", "sp 1", "sp2", "Fab", "sp3", "sp4", "sp5", "sp1"), 
                   stringsAsFactors = FALSE)

data[data$family == data$species, ]$species <- NA
data
#>   plot family species
#> 1    1    Fab    <NA>
#> 2    1    Fab    <NA>
#> 3    1    Fab    sp 1
#> 4    2    Pip     sp2
#> 5    2    Fab    <NA>
#> 6    3    Mel     sp3
#> 7    3    Myr     sp4
#> 8    3    Myr     sp5
#> 9    3    Fab     sp1

CodePudding user response:

library(tidyverse)

df %>%  
  mutate(species = case_when(species == family ~ NA_character_, 
                             TRUE ~ species))

# A tibble: 9 × 3
   plot family species
  <dbl> <chr>  <chr>  
1     1 Fab    NA     
2     1 Fab    NA     
3     1 Fab    sp 1   
4     2 Pip    sp2    
5     2 Fab    NA     
6     3 Mel    sp3    
7     3 Myr    sp4    
8     3 Myr    sp5    
9     3 Fab    sp1    
  •  Tags:  
  • r
  • Related