I have a df that looks like the following:
candidate; partyList; shareOfVotes; outcome
A list1 0.11 elected
B list1 0.10 elected
C list1 0.09 not-elected
D list2 0.22 elected
E list2 0.15 not-elected
F list2 0.02 not-elected
I want to create a new df that contains only the last elected and the first non-elected candidate for each candidate list. The only way to know if a candidate was elected is by checking the column "outcome". So, I believe the best way to do this would be to select the candidate with the lowest share of the votes among the elected ones, and the candidate with the highest share of the votes among the non-elected ones for each party list. The new df should look like this:
candidate; partyList; shareOfVotes; outcome
B list1 0.10 elected
C list1 0.09 not-elected
D list2 0.22 elected
E list2 0.15 not-elected
The df has several other columns with characteristics of the candidates that I want to keep. Thanks in advance to anyone who can help me.
CodePudding user response:
You can slice the first and last rows by list group when the data is ordered by outcome != "elected
and shareOfVotes
.
library(dplyr)
dat %>%
group_by(partyList) %>%
slice(do.call(order, list(outcome != "elected", shareOfVotes))[unique(c(1, n()))]) %>%
ungroup()
# A tibble: 4 × 4
candidate partyList shareOfVotes outcome
<chr> <chr> <dbl> <chr>
1 B list1 0.1 elected
2 C list1 0.09 not-elected
3 D list2 0.22 elected
4 E list2 0.15 not-elected
unique()
is there to ensure if there are any single case groups the result is not duplicated.
CodePudding user response:
You can group by partyList
and outcome
columns and select the last row for outcome = "elected"
or first row otherwise.
library(dplyr)
df %>%
group_by(partyList, outcome) %>%
slice(if(all(outcome == "elected")) n() else 1L) %>%
ungroup
# candidate partyList shareOfVotes outcome
# <chr> <chr> <dbl> <chr>
#1 B list1 0.1 elected
#2 C list1 0.09 not-elected
#3 D list2 0.22 elected
#4 E list2 0.15 not-elected
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(candidate = c("A", "B", "C", "D", "E", "F"),
partyList = c("list1", "list1", "list1", "list2", "list2", "list2"),
shareOfVotes = c(0.11, 0.1, 0.09, 0.22, 0.15, 0.02),
outcome = c("elected", "elected", "not-elected", "elected", "not-elected",
"not-elected")), class = "data.frame", row.names = c(NA, -6L))