Home > Blockchain >  Filter a data frame if 3 columns contain the same data
Filter a data frame if 3 columns contain the same data

Time:04-17

I have some data which looks like:

     Municipios_Names     Distritos_Names              Zonas_Names
1   Barcelona Capital         Sant Andreu              Sant Andreu
2   Barcelona Capital         Sant Andreu            Baró de Viver
3   Barcelona Capital         Sant Andreu            El Bon Pastor
4   Barcelona Capital         Sant Andreu El Congrés i els Indians
5   Barcelona Capital         Sant Andreu               La Sagrera
6   Barcelona Capital         Sant Andreu                    Navas
7   Barcelona Capital         Sant Andreu  Sant Andreu del Palomar
8   Barcelona Capital         Sant Andreu           Trinitat Vella
9 Sant Adrià de Besòs Sant Adrià de Besòs      Sant Adrià de Besòs

I am trying to filter the data such that it would remove Sant Andreu.

I want to filter the data based on the following conditions:

  • If all 3 columns are equal, keep the row.
  • If all 3 columns are different, keep the row.
  • If the first column is different to the second and third, extract that row (or if any combination of the 2 columns are different).

Essentially, I just want to remove Barcelona Capital Sant Andreu Sant Andreu but keep Sant Adrià de Besòs Sant Adrià de Besòs Sant Adrià de Besòs

Data:

df <- structure(list(Municipios_Names = c("Barcelona Capital", "Barcelona Capital", 
"Barcelona Capital", "Barcelona Capital", "Barcelona Capital", 
"Barcelona Capital", "Barcelona Capital", "Barcelona Capital", 
"Sant Adrià de Besòs"), Distritos_Names = c("Sant Andreu", 
"Sant Andreu", "Sant Andreu", "Sant Andreu", "Sant Andreu", "Sant Andreu", 
"Sant Andreu", "Sant Andreu", "Sant Adrià de Besòs"), Zonas_Names = c("Sant Andreu", 
"Baró de Viver", "El Bon Pastor", "El Congrés i els Indians", 
"La Sagrera", "Navas", "Sant Andreu del Palomar", "Trinitat Vella", 
"Sant Adrià de Besòs")), class = "data.frame", row.names = c(NA, 
-9L))

EDIT:

My main goal is to remove the first observation Barcelona Capital, Sant Andreu, Sant Andreu.

  • If I create something to remove the row when the following condition is met - Distritos_Names == Zonas_Names then I would correctly remove the first row - but I will also incorrectly remove the 9th row Sant Adrià de Besòs.
  • So I want to write something which would keep the 9th row since all 3 columns are the same but remove the 1st since only 2 columns are the same.
  • The other rows c(2:8)all have different columns so I want to keep them.

Expected output is just the data frame without the first row.

CodePudding user response:

We could use ifelse statement with helper columns x and y and then filter on them:

library(dplyr)
df %>% 
  mutate(x = ifelse(Municipios_Names==Distritos_Names &
                      Distritos_Names==Zonas_Names, 1, 0),
         y = ifelse(Municipios_Names!=Distritos_Names &
                      Distritos_Names!=Zonas_Names, 1, 0)) %>%
  filter(x==1 | y==1) %>% 
  select(-x, -y)
   Municipios_Names    Distritos_Names     Zonas_Names             
  <chr>               <chr>               <chr>                   
1 Barcelona Capital   Sant Andreu         Baró de Viver           
2 Barcelona Capital   Sant Andreu         El Bon Pastor           
3 Barcelona Capital   Sant Andreu         El Congrés i els Indians
4 Barcelona Capital   Sant Andreu         La Sagrera              
5 Barcelona Capital   Sant Andreu         Navas                   
6 Barcelona Capital   Sant Andreu         Sant Andreu del Palomar 
7 Barcelona Capital   Sant Andreu         Trinitat Vella          
8 Sant Adrià de Besòs Sant Adrià de Besòs Sant Adrià de Besòs  

CodePudding user response:

An option is also using n_distinct

library(dplyr)
df %>%
   rowwise %>%
   filter(n_distinct(c_across(everything())) %in% c(ncol(.), 1)) %>%
   ungroup

-output

# A tibble: 8 × 3
  Municipios_Names    Distritos_Names     Zonas_Names             
  <chr>               <chr>               <chr>                   
1 Barcelona Capital   Sant Andreu         Baró de Viver           
2 Barcelona Capital   Sant Andreu         El Bon Pastor           
3 Barcelona Capital   Sant Andreu         El Congrés i els Indians
4 Barcelona Capital   Sant Andreu         La Sagrera              
5 Barcelona Capital   Sant Andreu         Navas                   
6 Barcelona Capital   Sant Andreu         Sant Andreu del Palomar 
7 Barcelona Capital   Sant Andreu         Trinitat Vella          
8 Sant Adrià de Besòs Sant Adrià de Besòs Sant Adrià de Besòs     

Or with pmap

library(purrr)
 df %>%
   filter(pmap_lgl(across(everything()), 
     ~ n_distinct(c(...)) %in% c(ncol(df), 1)))
  • Related