From this dataframe I want to subset rows which have the identical "model", "Country" and "year" values but different "factor"
model <- c("A","B","C","A","A","C","B","A")
country <- c("Italy","Germany","Norway","Italy","France","Germany","Norway","Italy")
year <- c(2016,2016,2016,2016,2015,2015,2015,2015)
value <- c(14,24,11,34,73,11,33,22)
factor <- c("NEW","OLD","OLD","OLD","OLD","OLD","NEW","NEW")
df <- data.frame(model,country,year,value, factor)
model country year value factor
1 A Italy 2016 14 NEW
2 B Germany 2016 24 OLD
3 C Norway 2016 11 OLD
4 A Italy 2016 34 OLD
5 A France 2015 73 OLD
6 C Germany 2015 11 OLD
7 B Norway 2015 33 NEW
8 A Italy 2015 22 NEW
Expected output:
1 A Italy 2016 22 NEW
4 A Italy 2016 34 OLD
CodePudding user response:
What you could do is create a column which tells the number of duplicated based on the groups (model, country and year) and then filter
like this:
model <- c("A","B","C","A","A","C","B","A")
country <- c("Italy","Germany","Norway","Italy","France","Germany","Norway","Italy")
year <- c(2016,2016,2016,2016,2015,2015,2015,2015)
value <- c(14,24,11,34,73,11,33,22)
factor <- c("NEW","OLD","OLD","OLD","OLD","OLD","NEW","NEW")
df <- data.frame(model,country,year,value, factor)
library(dplyr)
df %>%
group_by(model, country, year) %>%
mutate(number_dups = n()) %>%
filter(number_dups > 1) %>%
select(-number_dups) %>%
ungroup()
#> # A tibble: 2 × 5
#> model country year value factor
#> <chr> <chr> <dbl> <dbl> <chr>
#> 1 A Italy 2016 14 NEW
#> 2 A Italy 2016 34 OLD
Created on 2022-08-12 by the reprex package (v2.0.1)
Please note: I assume that the expected output should be 14 instead of 22.
CodePudding user response:
You can group by all columns, except for factor, then keep only the rows where there is more than 1 observation that is the same. Then, to ensure we only have those with NEW
and OLD
, we can add factor
to the group, and keep only the first row of each.
library(dplyr)
df %>%
group_by(across(c(model,country,year))) %>%
filter(n() > 1) %>%
group_by(factor, .add = T) %>%
slice(1)
Output
model country year value factor
<chr> <chr> <dbl> <dbl> <chr>
1 A Italy 2016 34 NEW
2 A Italy 2016 34 OLD