so I have been trying to figure out which party won in a specific region in my country using the number of votes per location, the problem is that when I use group by region (DEPARTAMENTO), I cannot keep the name of the party, only the votes. When I group by region and party (DEPARTAMENTO, AGRUPACION), instead of 25 values I got 68 values because of the different denominations for political parties.
I hope this is not that confusing. And thanks.
ERM2002ganador <-
ERMfinalt2002 %>%
group_by(DEPARTAMENTO)%>%
summarize(max(VTOTAL,na.rm = FALSE))
I am trying to get something like the following
DEPARTAMENTO VOTES(VTOTAL) AGRUPACION TYPE
LAMBAYEQUE 250000 PERU POSIBLE PP
What I got now is only
DEPARTAMENTO VOTES
Lambayeque 250000
And If I use the group by for TYPE too I got the following
DEPARTAMENTO VOTES TYPE
LAMBAYEQUE 25000 PP
LAMBAYEQUE 20000 MR
CodePudding user response:
To get the results you want, you can use the 'which.max' function to identify the row with the maximum votes for each DEPARTAMENTO, and then use that row number to extract the AGRUPACION and TYPE values.
Here's an example of how you can do this:
ERM2002ganador <- ERMfinalt2002 %>%
group_by(DEPARTAMENTO) %>%
summarize(max_votes = max(VTOTAL, na.rm = TRUE)) %>%
mutate(row_num = row_number())
ERMfinalt2002 %>%
left_join(ERM2002ganador, by = "DEPARTAMENTO") %>%
filter(VTOTAL == max_votes) %>%
select(DEPARTAMENTO, VTOTAL, AGRUPACION, TYPE)
This will give you a data frame with one row for each DEPARTAMENTO, with the columns DEPARTAMENTO, VTOTAL, AGRUPACION, and TYPE. The VTOTAL column will contain the maximum votes for that DEPARTAMENTO, and the AGRUPACION and TYPE columns will contain the values for the party that received the maximum votes.
CodePudding user response:
You can use dplyr::slice_max()
instead of summarize()
. This will keep the row with the highest VTOTAL
for each group.
library(dplyr)
ERMfinalt2002 %>%
group_by(DEPARTAMENTO) %>%
slice_max(VTOTAL) %>%
ungroup()
# A tibble: 3 × 4
DEPARTAMENTO AGRUPACION TYPE VTOTAL
<fct> <fct> <fct> <dbl>
1 AREQUIPA PERU POSIBLE PP 227581
2 LAMBAYEQUE PARTIDO APRISTA PERUANO PP 290516
3 LIMA PERU POSIBLE PP 147409
Example data:
set.seed(13)
ERMfinalt2002 <- expand.grid(
DEPARTAMENTO = c("AREQUIPA", "LAMBAYEQUE", "LIMA"),
AGRUPACION = c("PERU POSIBLE", "PARTIDO APRISTA PERUANO"),
TYPE = "PP"
)
ERMfinalt2002$VTOTAL = round(runif(6, 50000, 300000))