I have some data which looks like:
Municipios_Names Distritos_Names Zonas_Names
1 Barcelona Capital Sant Andreu Sant Andreu
2 Barcelona Capital Sant Andreu Baró de Viver
3 Barcelona Capital Sant Andreu El Bon Pastor
4 Barcelona Capital Sant Andreu El Congrés i els Indians
5 Barcelona Capital Sant Andreu La Sagrera
6 Barcelona Capital Sant Andreu Navas
7 Barcelona Capital Sant Andreu Sant Andreu del Palomar
8 Barcelona Capital Sant Andreu Trinitat Vella
9 Sant Adrià de Besòs Sant Adrià de Besòs Sant Adrià de Besòs
I am trying to filter the data such that it would remove Sant Andreu
.
I want to filter the data based on the following conditions:
- If all 3 columns are equal, keep the row.
- If all 3 columns are different, keep the row.
- If the first column is different to the second and third, extract that row (or if any combination of the 2 columns are different).
Essentially, I just want to remove Barcelona Capital Sant Andreu Sant Andreu
but keep Sant Adrià de Besòs Sant Adrià de Besòs Sant Adrià de Besòs
Data:
df <- structure(list(Municipios_Names = c("Barcelona Capital", "Barcelona Capital",
"Barcelona Capital", "Barcelona Capital", "Barcelona Capital",
"Barcelona Capital", "Barcelona Capital", "Barcelona Capital",
"Sant Adrià de Besòs"), Distritos_Names = c("Sant Andreu",
"Sant Andreu", "Sant Andreu", "Sant Andreu", "Sant Andreu", "Sant Andreu",
"Sant Andreu", "Sant Andreu", "Sant Adrià de Besòs"), Zonas_Names = c("Sant Andreu",
"Baró de Viver", "El Bon Pastor", "El Congrés i els Indians",
"La Sagrera", "Navas", "Sant Andreu del Palomar", "Trinitat Vella",
"Sant Adrià de Besòs")), class = "data.frame", row.names = c(NA,
-9L))
EDIT:
My main goal is to remove the first observation Barcelona Capital
, Sant Andreu
, Sant Andreu
.
- If I create something to remove the row when the following condition is met -
Distritos_Names == Zonas_Names
then I would correctly remove the first row - but I will also incorrectly remove the 9th rowSant Adrià de Besòs
. - So I want to write something which would keep the 9th row since all 3 columns are the same but remove the 1st since only 2 columns are the same.
- The other rows
c(2:8)
all have different columns so I want to keep them.
Expected output is just the data frame without the first row.
CodePudding user response:
We could use ifelse statement with helper columns x
and y
and then filter on them:
library(dplyr)
df %>%
mutate(x = ifelse(Municipios_Names==Distritos_Names &
Distritos_Names==Zonas_Names, 1, 0),
y = ifelse(Municipios_Names!=Distritos_Names &
Distritos_Names!=Zonas_Names, 1, 0)) %>%
filter(x==1 | y==1) %>%
select(-x, -y)
Municipios_Names Distritos_Names Zonas_Names
<chr> <chr> <chr>
1 Barcelona Capital Sant Andreu Baró de Viver
2 Barcelona Capital Sant Andreu El Bon Pastor
3 Barcelona Capital Sant Andreu El Congrés i els Indians
4 Barcelona Capital Sant Andreu La Sagrera
5 Barcelona Capital Sant Andreu Navas
6 Barcelona Capital Sant Andreu Sant Andreu del Palomar
7 Barcelona Capital Sant Andreu Trinitat Vella
8 Sant Adrià de Besòs Sant Adrià de Besòs Sant Adrià de Besòs
CodePudding user response:
An option is also using n_distinct
library(dplyr)
df %>%
rowwise %>%
filter(n_distinct(c_across(everything())) %in% c(ncol(.), 1)) %>%
ungroup
-output
# A tibble: 8 × 3
Municipios_Names Distritos_Names Zonas_Names
<chr> <chr> <chr>
1 Barcelona Capital Sant Andreu Baró de Viver
2 Barcelona Capital Sant Andreu El Bon Pastor
3 Barcelona Capital Sant Andreu El Congrés i els Indians
4 Barcelona Capital Sant Andreu La Sagrera
5 Barcelona Capital Sant Andreu Navas
6 Barcelona Capital Sant Andreu Sant Andreu del Palomar
7 Barcelona Capital Sant Andreu Trinitat Vella
8 Sant Adrià de Besòs Sant Adrià de Besòs Sant Adrià de Besòs
Or with pmap
library(purrr)
df %>%
filter(pmap_lgl(across(everything()),
~ n_distinct(c(...)) %in% c(ncol(df), 1)))