Choose top two rows based on a column and show it in a summary table-CodePudding

I am working on an election result data and I want to show most and second most voted parties in provinces with their vote numbers.

I do not have a problem with showing the most voted party and its votes but for the second party, it shows NA. Here's a toy data for reproducing. I guess if it works for one province it will work for all.

structure(list(plaka = c("01", "01", "01", "01", "01", "01"), 
    il_adi = c("ADANA", "ADANA", "ADANA", "ADANA", "ADANA", "ADANA"
    ), parti = c("MİLLET", "VATAN PARTİSİ", "CHP", "HAK-PAR", 
    "SAADET", "DSP"), oy_2015 = c(482, 2841, 374000, 2581, 8001, 
    774)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", 
"data.frame"))

I've tried nth() function both from dplyr and Rfast packages. Here's my code:

mv_26 %>% 
  group_by(il_adi) %>% 
  arrange(il_adi,desc(oy_2015)) %>% 
  slice(1:2) %>% 
  summarise(win_party_2015 = parti[which.max(oy_2015)],
            win_party_p_2015 = max(oy_2015)/sum(oy_2015),
            second_2015 = parti[nth(oy_2015,index.return = 2)])

Edit: Most answers provided solutions for creating a new data frame but what I want to achieve is to show them in summaries() so that I can still make computations from the original dataset such as percentage finding.

CodePudding user response：

Using subset from base R

subset(df1, ave(-oy_2015, plaka, il_adi, FUN = rank) %in% 1:2)
# A tibble: 2 × 4
  plaka il_adi parti  oy_2015
  <chr> <chr>  <chr>    <dbl>
1 01    ADANA  CHP     374000
2 01    ADANA  SAADET    8001

CodePudding user response：

Update:

Based on OP's comment, we need to get the summary first and then select the top two rows.

Here's a dplyr approach for that:

df1 %>% 
  group_by(il_adi) %>% 
  mutate(oy_2015_perc = 100*proportions(oy_2015)) %>% 
  arrange(desc(oy_2015)) %>% 
  slice(1:2)

Using data.table:

library(data.table)

setorder(setDT(df1), -oy_2015)[,.SD[1:2], il_adi]

#    il_adi plaka  parti oy_2015
# 1:  ADANA    01    CHP  374000
# 2:  ADANA    01 SAADET    8001

Another approach in base:

do.call(rbind, (by(df1[order(-df1$oy_2015),],
                    df1[order(-df1$oy_2015),"il_adi"], 
                   function(x) head(x, 2))))

CodePudding user response：

Use slice_max:

The former top_n() has been superseded in favour of slice_min()/slice_max(). https://dplyr.tidyverse.org/reference/top_n.html

It is quite straight to use. No arrange needed.

library(dplyr)
df %>% 
  group_by(plaka, il_adi) %>% 
  slice_max(oy_2015, n=2)

 plaka il_adi parti  oy_2015
  <chr> <chr>  <chr>    <dbl>
1 01    ADANA  CHP     374000
2 01    ADANA  SAADET    8001

CodePudding user response：

df = structure(list(plaka = c("01", "01", "01", "01", "01", "01"), 
               il_adi = c("ADANA", "ADANA", "ADANA", "ADANA", "ADANA", "ADANA"
               ), parti = c("MİLLET", "VATAN PARTİSİ", "CHP", "HAK-PAR", 
                            "SAADET", "DSP"), oy_2015 = c(482, 2841, 374000, 2581, 8001, 
                                                          774)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", 
                                                                                                   "data.frame"))

df %>%  slice_max(n = 2, order_by = oy_2015) %>% select(parti, oy_2015)

Output:

  parti  oy_2015
  <chr>    <dbl>
1 CHP     374000
2 SAADET    8001

Edit: simplified it a bit

CodePudding user response：

you can group by providence then arrange by votes and select the first and second row

 df2 <- df %>% 
 group_by(il_adi) %>%
            arrange(desc(oy_2015)) %>% 
            filter(row_number()<=2)