I want to generate a group id, based on combinations of two variables (country and party). This is my data:
df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
year = c(2010, 2010, 2010, 2010, 2010, 2010),
party = c(NA, NA, NA, "A", "B", "B"))
Which gives:
country year party
1 BE 2010 <NA>
2 BE 2010 <NA>
3 BE 2010 <NA>
4 NL 2010 A
5 NL 2010 B
6 NL 2010 B
What I want is:
country year party group
<chr> <dbl> <chr> <int>
1 BE 2010 NA NA
2 BE 2010 NA NA
3 BE 2010 NA NA
4 NL 2010 A 1
5 NL 2010 B 2
6 NL 2010 B 2
I tried:
df <- df %>%
group_by(country, party) %>%
mutate(group = cur_group_id())
But this gives me:
country year party group
<chr> <dbl> <chr> <int>
1 BE 2010 NA 1
2 BE 2010 NA 1
3 BE 2010 NA 1
4 NL 2010 A 2
5 NL 2010 B 3
6 NL 2010 B 3
However, I don't want separate groups for any data that has missing values. At the same time, I want to keep the data.
If I try:
df <- df %>%
group_by(country, party) %>%
filter(!is.na(party)) %>%
mutate(group = cur_group_id())
I get:
country year party group
<chr> <dbl> <chr> <int>
1 NL 2010 A 1
2 NL 2010 B 2
3 NL 2010 B 2
How can I get this new variable only for complete data, whilst keeping the incomplete data in the dataset?
Thanks
CodePudding user response:
Use interaction
df %>% mutate(group = as.integer(interaction(country, party, drop = TRUE)))
giving:
country year party group
1 BE 2010 <NA> NA
2 BE 2010 <NA> NA
3 BE 2010 <NA> NA
4 NL 2010 A 1
5 NL 2010 B 2
6 NL 2010 B 2
CodePudding user response:
df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
year = c(2010, 2010, 2010, 2010, 2010, 2010),
party = c(NA, NA, NA, "A", "B", "B"))
library(data.table)
setDT(df)[!is.na(party), grp := .GRP, by = party][]
#> country year party grp
#> 1: BE 2010 <NA> NA
#> 2: BE 2010 <NA> NA
#> 3: BE 2010 <NA> NA
#> 4: NL 2010 A 1
#> 5: NL 2010 B 2
#> 6: NL 2010 B 2
Created on 2021-12-21 by the reprex package (v2.0.1)
CodePudding user response:
Something like the following?
library(tidyverse)
df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
year = c(2010, 2010, 2010, 2010, 2010, 2010),
party = c(NA, NA, NA, "A", "B", "B"))
df %>%
group_by(country, party) %>%
mutate(group = if_else(is.na(party), NA_integer_, cur_group_id()))
#> # A tibble: 6 × 4
#> # Groups: country, party [3]
#> country year party group
#> <chr> <dbl> <chr> <int>
#> 1 BE 2010 <NA> NA
#> 2 BE 2010 <NA> NA
#> 3 BE 2010 <NA> NA
#> 4 NL 2010 A 2
#> 5 NL 2010 B 3
#> 6 NL 2010 B 3
If you want the groups to begin with 1 (instead of 2):
library(tidyverse)
df %>%
filter(!is.na(party)) %>%
group_by(country, party) %>%
mutate(group = cur_group_id()) %>%
ungroup %>% add_row(filter(df,is.na(party))) %>%
mutate(group = if_else(is.na(party), NA_integer_, group))
#> # A tibble: 6 × 4
#> country year party group
#> <chr> <dbl> <chr> <int>
#> 1 NL 2010 A 1
#> 2 NL 2010 B 2
#> 3 NL 2010 B 2
#> 4 BE 2010 <NA> NA
#> 5 BE 2010 <NA> NA
#> 6 BE 2010 <NA> NA