I am trying to collapse a dataset based on conditions and groupings from another dataset. My current dataframe looks like this
For every 'RollNo' in every 'congress' I want a new variable indicating if the two senators in the same state voted together (1,0) and are in the same party (1,0)
congress | sen | RollNo | state | Vote | Party |
---|---|---|---|---|---|
106 | Jay | 1 | Ark | 1 | Rep |
106 | Mary | 1 | Ark | 1 | Dem |
106 | Bill | 2 | Ten | 2 | Dem |
106 | Kevin | 2 | Ten | 1 | Dem |
108 | Sue | 1 | Ore | 2 | Rep |
108 | Sally | 1 | Ore | 2 | Rep |
108 | Lisa | 3 | SDak | 1 | Rep |
108 | Penny | 3 | SDak | 2 | Rep |
109 | Jay | 1 | Mich | 1 | Dem |
109 | Mary | 1 | Mich | 9 | Rep |
109 | Rudy | 5 | Cal | 1 | Dem |
109 | Niles | 5 | Cal | 1 | Dem |
The new dataframe should look like this:
congress | RollNo | state | Pair_Vote | Pair_Party |
---|---|---|---|---|
106 | 1 | Ark | 1 | 0 |
106 | 2 | Ten | 0 | 1 |
108 | 1 | Ore | 1 | 1 |
108 | 3 | SDak | 0 | 1 |
109 | 1 | Mich | 0 | 0 |
109 | 5 | Cal | 1 | 1 |
I have tried the code below, tweaked it several times. My my dataset returns with the same observation and two new columns empty vectors for my new variables.
library(dplyr)
dataframe['Pair_Vote'] <- NA
dataframe['Pair_Party'] <- NA
newdata <- dataframe %>% group_by(congress, RollNo, state) %>%
mutate(Pair_Vote - case_when(any(Vote == Vote) ~ 1, FALSE ~ 0))
I'm at a loss.
CodePudding user response:
Use mutate
when you want to add columns to a data frame as-is, use summarize
when you want the result to have one row per group. Your output has one row per group, so we will use summarize
.
And vote == vote
won't do much useful, let's use n_distinct
to count distinct values.
dataframe %>%
group_by(congress, RollNo, state) %>%
summarize(
Pair_Vote = ifelse(n_distinct(Vote) == 1, 1, 0),
Pair_Party = ifelse(n_distinct(Party) == 1, 1, 0)
)
CodePudding user response:
We may use across
to modify multiple columns at once
library(dplyr)
dataframe %>%
group_by(congress, RollNo, state) %>%
summarise(across(c(Vote, Party),
~ (n_distinct(.x) == 1), .names = "Pair_{.col}"), .groups = 'drop')
-output
# A tibble: 6 × 5
congress RollNo state Pair_Vote Pair_Party
<int> <int> <chr> <int> <int>
1 106 1 Ark 1 0
2 106 2 Ten 0 1
3 108 1 Ore 1 1
4 108 3 SDak 0 1
5 109 1 Mich 0 0
6 109 5 Cal 1 1
data
dataframe <- structure(list(congress = c(106L, 106L, 106L, 106L, 108L, 108L,
108L, 108L, 109L, 109L, 109L, 109L), sen = c("Jay", "Mary", "Bill",
"Kevin", "Sue", "Sally", "Lisa", "Penny", "Jay", "Mary", "Rudy",
"Niles"), RollNo = c(1L, 1L, 2L, 2L, 1L, 1L, 3L, 3L, 1L, 1L,
5L, 5L), state = c("Ark", "Ark", "Ten", "Ten", "Ore", "Ore",
"SDak", "SDak", "Mich", "Mich", "Cal", "Cal"), Vote = c(1L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 9L, 1L, 1L), Party = c("Rep", "Dem",
"Dem", "Dem", "Rep", "Rep", "Rep", "Rep", "Dem", "Rep", "Dem",
"Dem")), class = "data.frame", row.names = c(NA, -12L))