Home > Blockchain >  I want to use group by and summarize in R, but keep some values from the observation
I want to use group by and summarize in R, but keep some values from the observation

Time:01-03

so I have been trying to figure out which party won in a specific region in my country using the number of votes per location, the problem is that when I use group by region (DEPARTAMENTO), I cannot keep the name of the party, only the votes. When I group by region and party (DEPARTAMENTO, AGRUPACION), instead of 25 values I got 68 values because of the different denominations for political parties.

I hope this is not that confusing. And thanks.

ERM2002ganador <-
ERMfinalt2002 %>%
group_by(DEPARTAMENTO)%>%
summarize(max(VTOTAL,na.rm = FALSE))

I am trying to get something like the following

DEPARTAMENTO VOTES(VTOTAL) AGRUPACION   TYPE
LAMBAYEQUE   250000        PERU POSIBLE PP

What I got now is only

DEPARTAMENTO VOTES
Lambayeque   250000

And If I use the group by for TYPE too I got the following

DEPARTAMENTO VOTES TYPE
LAMBAYEQUE   25000 PP
LAMBAYEQUE   20000 MR

CodePudding user response:

To get the results you want, you can use the 'which.max' function to identify the row with the maximum votes for each DEPARTAMENTO, and then use that row number to extract the AGRUPACION and TYPE values.

Here's an example of how you can do this:

ERM2002ganador <- ERMfinalt2002 %>%
  group_by(DEPARTAMENTO) %>%
  summarize(max_votes = max(VTOTAL, na.rm = TRUE)) %>%
  mutate(row_num = row_number())

ERMfinalt2002 %>%
  left_join(ERM2002ganador, by = "DEPARTAMENTO") %>%
  filter(VTOTAL == max_votes) %>%
  select(DEPARTAMENTO, VTOTAL, AGRUPACION, TYPE)

This will give you a data frame with one row for each DEPARTAMENTO, with the columns DEPARTAMENTO, VTOTAL, AGRUPACION, and TYPE. The VTOTAL column will contain the maximum votes for that DEPARTAMENTO, and the AGRUPACION and TYPE columns will contain the values for the party that received the maximum votes.

CodePudding user response:

You can use dplyr::slice_max() instead of summarize(). This will keep the row with the highest VTOTAL for each group.

library(dplyr)

ERMfinalt2002 %>%
  group_by(DEPARTAMENTO) %>%
  slice_max(VTOTAL) %>%
  ungroup()
# A tibble: 3 × 4
  DEPARTAMENTO AGRUPACION              TYPE  VTOTAL
  <fct>        <fct>                   <fct>  <dbl>
1 AREQUIPA     PERU POSIBLE            PP    227581
2 LAMBAYEQUE   PARTIDO APRISTA PERUANO PP    290516
3 LIMA         PERU POSIBLE            PP    147409

Example data:

set.seed(13)

ERMfinalt2002 <- expand.grid(
    DEPARTAMENTO = c("AREQUIPA", "LAMBAYEQUE", "LIMA"),
    AGRUPACION = c("PERU POSIBLE", "PARTIDO APRISTA PERUANO"),
    TYPE = "PP"
  )

ERMfinalt2002$VTOTAL = round(runif(6, 50000, 300000))
  •  Tags:  
  • r
  • Related