Using group_by() to collapse a dataset in R based on conditions-CodePudding

I am trying to collapse a dataset based on conditions and groupings from another dataset. My current dataframe looks like this

For every 'RollNo' in every 'congress' I want a new variable indicating if the two senators in the same state voted together (1,0) and are in the same party (1,0)

congress	sen	RollNo	state	Vote	Party
106	Jay	1	Ark	1	Rep
106	Mary	1	Ark	1	Dem
106	Bill	2	Ten	2	Dem
106	Kevin	2	Ten	1	Dem
108	Sue	1	Ore	2	Rep
108	Sally	1	Ore	2	Rep
108	Lisa	3	SDak	1	Rep
108	Penny	3	SDak	2	Rep
109	Jay	1	Mich	1	Dem
109	Mary	1	Mich	9	Rep
109	Rudy	5	Cal	1	Dem
109	Niles	5	Cal	1	Dem

The new dataframe should look like this:

congress	RollNo	state	Pair_Vote	Pair_Party
106	1	Ark	1	0
106	2	Ten	0	1
108	1	Ore	1	1
108	3	SDak	0	1
109	1	Mich	0	0
109	5	Cal	1	1

I have tried the code below, tweaked it several times. My my dataset returns with the same observation and two new columns empty vectors for my new variables.

library(dplyr)
dataframe['Pair_Vote'] <- NA
dataframe['Pair_Party'] <- NA
newdata <- dataframe %>% group_by(congress, RollNo, state) %>% 
  mutate(Pair_Vote - case_when(any(Vote == Vote) ~ 1, FALSE ~ 0))

I'm at a loss.

CodePudding user response：

Use mutate when you want to add columns to a data frame as-is, use summarize when you want the result to have one row per group. Your output has one row per group, so we will use summarize.

And vote == vote won't do much useful, let's use n_distinct to count distinct values.

dataframe %>% 
  group_by(congress, RollNo, state) %>% 
  summarize(
    Pair_Vote = ifelse(n_distinct(Vote) == 1, 1, 0),
    Pair_Party = ifelse(n_distinct(Party) == 1, 1, 0)
  )

CodePudding user response：

We may use across to modify multiple columns at once

library(dplyr)
dataframe %>%
    group_by(congress, RollNo, state) %>%
    summarise(across(c(Vote, Party),
       ~  (n_distinct(.x) == 1), .names = "Pair_{.col}"), .groups = 'drop')

-output

# A tibble: 6 × 5
  congress RollNo state Pair_Vote Pair_Party
     <int>  <int> <chr>     <int>      <int>
1      106      1 Ark           1          0
2      106      2 Ten           0          1
3      108      1 Ore           1          1
4      108      3 SDak          0          1
5      109      1 Mich          0          0
6      109      5 Cal           1          1

data

dataframe <- structure(list(congress = c(106L, 106L, 106L, 106L, 108L, 108L, 
108L, 108L, 109L, 109L, 109L, 109L), sen = c("Jay", "Mary", "Bill", 
"Kevin", "Sue", "Sally", "Lisa", "Penny", "Jay", "Mary", "Rudy", 
"Niles"), RollNo = c(1L, 1L, 2L, 2L, 1L, 1L, 3L, 3L, 1L, 1L, 
5L, 5L), state = c("Ark", "Ark", "Ten", "Ten", "Ore", "Ore", 
"SDak", "SDak", "Mich", "Mich", "Cal", "Cal"), Vote = c(1L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 9L, 1L, 1L), Party = c("Rep", "Dem", 
"Dem", "Dem", "Rep", "Rep", "Rep", "Rep", "Dem", "Rep", "Dem", 
"Dem")), class = "data.frame", row.names = c(NA, -12L))