Home > Software design >  In R group_by and loop within dplyr
In R group_by and loop within dplyr

Time:06-17

I have the following dataframe

my_df <- data.frame(Municipality=c('a', 'a', 'a', 'a', 'b', 'b', 'c','c','c','d','d'),
                    state=c('ac', 'ac', 'ac', 'ac', 'pb', 'pb', 'am','am','am','pi','pi'),
                    votes=c(541, 463, 246, 49, 2443, 2287, 1035,3530,9999,666,3809))

I would like to calculate the vote shares of each "Municipality" and the difference ("margin victory") of each one of them in relation to the highest vote shares by state. I tried the following code

actual_df<-my_df %>%
  group_by(Municipality,state) %>% 
  mutate(
    share_vote = votes / sum(votes), # calculate vote shares
    margin_victory = (max(share_vote)-(max( share_vote[share_vote!=max(share_vote)]))),
  ) %>% 
  ungroup()

This code is calculating share vote correctly as expected. However, the "margin victory" is correct only when you have two Municipalities. The below is what I would like to have

desired_df <- data.frame(Municipality=c('a', 'a', 'a', 'a', 'b', 'b', 'c','c','c','d','d'),
                    state=c('ac', 'ac', 'ac', 'ac', 'pb', 'pb', 'am','am','am','pi','pi'),
                    votes=c(541, 463, 246, 49, 2443, 2287, 1035,3530,9999,666,3809),
                    margin_victory= c(0.06004619,-0.06004619,0.2270978, 0.3787529,
                                      0.03298097,-0.03298097,
                                      -0.6154902,-0.44417742,0.44417742,
                                      -0.70234637,0.70234637))

I tried to replace "margin victory" in the "actual df" code with margin_victory = for (i in share_vote ) {max(share_vote)-share_vote}, but without sucess.

CodePudding user response:

Are you sure about the signs of your desired result? If not, I would have suggested the following:

library(tidyverse)

my_df %>% group_by(Municipality, state) %>%
  mutate(
    share_vote = votes / sum(votes),
    mar = ifelse(votes == max(votes),
                 votes - max(votes[votes != max(votes)]),
                 (votes - max(votes))) / sum(votes)) %>%
  ungroup()
#> # A tibble: 11 × 5
#>    Municipality state votes share_vote     mar
#>    <chr>        <chr> <dbl>      <dbl>   <dbl>
#>  1 a            ac      541     0.416   0.0600
#>  2 a            ac      463     0.356  -0.0600
#>  3 a            ac      246     0.189  -0.227 
#>  4 a            ac       49     0.0377 -0.379 
#>  5 b            pb     2443     0.516   0.0330
#>  6 b            pb     2287     0.484  -0.0330
#>  7 c            am     1035     0.0711 -0.615 
#>  8 c            am     3530     0.242  -0.444 
#>  9 c            am     9999     0.687   0.444 
#> 10 d            pi      666     0.149  -0.702 
#> 11 d            pi     3809     0.851   0.702

Created on 2022-06-17 by the reprex package (v2.0.1)

  • Related