I am working on an election result data and I want to show most and second most voted parties in provinces with their vote numbers.
I do not have a problem with showing the most voted party and its votes but for the second party, it shows NA. Here's a toy data for reproducing. I guess if it works for one province it will work for all.
structure(list(plaka = c("01", "01", "01", "01", "01", "01"),
il_adi = c("ADANA", "ADANA", "ADANA", "ADANA", "ADANA", "ADANA"
), parti = c("MİLLET", "VATAN PARTİSİ", "CHP", "HAK-PAR",
"SAADET", "DSP"), oy_2015 = c(482, 2841, 374000, 2581, 8001,
774)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
I've tried nth()
function both from dplyr
and Rfast
packages. Here's my code:
mv_26 %>%
group_by(il_adi) %>%
arrange(il_adi,desc(oy_2015)) %>%
slice(1:2) %>%
summarise(win_party_2015 = parti[which.max(oy_2015)],
win_party_p_2015 = max(oy_2015)/sum(oy_2015),
second_2015 = parti[nth(oy_2015,index.return = 2)])
Edit: Most answers provided solutions for creating a new data frame but what I want to achieve is to show them in summaries()
so that I can still make computations from the original dataset such as percentage finding.
CodePudding user response:
Using subset
from base R
subset(df1, ave(-oy_2015, plaka, il_adi, FUN = rank) %in% 1:2)
# A tibble: 2 × 4
plaka il_adi parti oy_2015
<chr> <chr> <chr> <dbl>
1 01 ADANA CHP 374000
2 01 ADANA SAADET 8001
CodePudding user response:
Update:
Based on OP's comment, we need to get the summary first and then select the top two rows.
Here's a dplyr
approach for that:
df1 %>%
group_by(il_adi) %>%
mutate(oy_2015_perc = 100*proportions(oy_2015)) %>%
arrange(desc(oy_2015)) %>%
slice(1:2)
Using data.table
:
library(data.table)
setorder(setDT(df1), -oy_2015)[,.SD[1:2], il_adi]
# il_adi plaka parti oy_2015
# 1: ADANA 01 CHP 374000
# 2: ADANA 01 SAADET 8001
Another approach in base
:
do.call(rbind, (by(df1[order(-df1$oy_2015),],
df1[order(-df1$oy_2015),"il_adi"],
function(x) head(x, 2))))
CodePudding user response:
Use slice_max
:
The former top_n()
has been superseded in favour of slice_min()/slice_max(). https://dplyr.tidyverse.org/reference/top_n.html
It is quite straight to use. No arrange needed.
library(dplyr)
df %>%
group_by(plaka, il_adi) %>%
slice_max(oy_2015, n=2)
plaka il_adi parti oy_2015
<chr> <chr> <chr> <dbl>
1 01 ADANA CHP 374000
2 01 ADANA SAADET 8001
CodePudding user response:
df = structure(list(plaka = c("01", "01", "01", "01", "01", "01"),
il_adi = c("ADANA", "ADANA", "ADANA", "ADANA", "ADANA", "ADANA"
), parti = c("MİLLET", "VATAN PARTİSİ", "CHP", "HAK-PAR",
"SAADET", "DSP"), oy_2015 = c(482, 2841, 374000, 2581, 8001,
774)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
df %>% slice_max(n = 2, order_by = oy_2015) %>% select(parti, oy_2015)
Output:
parti oy_2015
<chr> <dbl>
1 CHP 374000
2 SAADET 8001
Edit: simplified it a bit
CodePudding user response:
you can group by providence then arrange by votes and select the first and second row
df2 <- df %>%
group_by(il_adi) %>%
arrange(desc(oy_2015)) %>%
filter(row_number()<=2)