Home > other >  How do I create a function that selects the last elected candidate and the first not elected for eac
How do I create a function that selects the last elected candidate and the first not elected for eac

Time:06-01

I have a df that looks like the following:

candidate;  partyList; shareOfVotes; outcome
A           list1      0.11          elected
B           list1      0.10          elected
C           list1      0.09          not-elected
D           list2      0.22          elected
E           list2      0.15          not-elected
F           list2      0.02          not-elected

I want to create a new df that contains only the last elected and the first non-elected candidate for each candidate list. The only way to know if a candidate was elected is by checking the column "outcome". So, I believe the best way to do this would be to select the candidate with the lowest share of the votes among the elected ones, and the candidate with the highest share of the votes among the non-elected ones for each party list. The new df should look like this:

candidate;  partyList; shareOfVotes; outcome
B           list1      0.10          elected
C           list1      0.09          not-elected
D           list2      0.22          elected
E           list2      0.15          not-elected

The df has several other columns with characteristics of the candidates that I want to keep. Thanks in advance to anyone who can help me.

CodePudding user response:

You can slice the first and last rows by list group when the data is ordered by outcome != "elected and shareOfVotes.

library(dplyr)

dat %>%
  group_by(partyList) %>%
  slice(do.call(order, list(outcome != "elected", shareOfVotes))[unique(c(1, n()))]) %>%
  ungroup()  

# A tibble: 4 × 4
  candidate partyList shareOfVotes outcome    
  <chr>     <chr>            <dbl> <chr>      
1 B         list1             0.1  elected    
2 C         list1             0.09 not-elected
3 D         list2             0.22 elected    
4 E         list2             0.15 not-elected

unique() is there to ensure if there are any single case groups the result is not duplicated.

CodePudding user response:

You can group by partyList and outcome columns and select the last row for outcome = "elected" or first row otherwise.

library(dplyr)

df %>%
  group_by(partyList, outcome) %>%
  slice(if(all(outcome == "elected")) n() else 1L) %>%
  ungroup

# candidate partyList shareOfVotes outcome    
#  <chr>     <chr>            <dbl> <chr>      
#1 B         list1             0.1  elected    
#2 C         list1             0.09 not-elected
#3 D         list2             0.22 elected    
#4 E         list2             0.15 not-elected

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(candidate = c("A", "B", "C", "D", "E", "F"), 
partyList = c("list1", "list1", "list1", "list2", "list2", "list2"),
shareOfVotes = c(0.11, 0.1, 0.09, 0.22, 0.15, 0.02), 
outcome = c("elected", "elected", "not-elected", "elected", "not-elected", 
"not-elected")), class = "data.frame", row.names = c(NA, -6L))
  • Related