I have a dataset (df_1) in which I have a large number of individual observations, from different areas, in different years.
I want to create a variable which indicates if an individual is in the modal category (say has the most common gender) for their area/year, 0 otherwise.
I can create a second dataset (df_2) of the modal gender for each area/year, How would I then create the variable in df_1 indicating if an individual is in the modal category?
Example:
df1 <- data.frame(
ID = c(1:10),
Area = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
Year = c(2020, 2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021),
Gender = c(5, 5, 5, 10, 5, 10, 10, 5, 10, 10))
# Create mode function
mode <- function(codes){which.max(tabulate(codes))}
groups <- df1 %>% group_by(Area, Year) %>% summarise(mode = mode(Gender)))
I then want to add a column to df1 indicating if the individual is in the modal category for their survevy year.
CodePudding user response:
One way is to count by the desired groupings and then check if the value for the group equals the max value. e.g. for Gender by Area and Year:
library(dplyr)
df1 %>%
group_by(Area, Year) %>%
add_count(Gender) %>%
mutate(modal = n == max(n)) %>%
ungroup()
# A tibble: 10 × 6
ID Area Year Gender n modal
<int> <dbl> <dbl> <dbl> <int> <lgl>
1 1 1 2020 5 2 TRUE
2 2 1 2020 5 2 TRUE
3 3 1 2021 5 2 TRUE
4 4 1 2021 10 1 FALSE
5 5 1 2021 5 2 TRUE
6 6 2 2020 10 2 TRUE
7 7 2 2020 10 2 TRUE
8 8 2 2020 5 1 FALSE
9 9 2 2021 10 2 TRUE
10 10 2 2021 10 2 TRUE
CodePudding user response:
df1 <- data.frame(
ID = c(1:10),
Area = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
Year = c(2020, 2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021),
Gender = c(5, 5, 5, 10, 5, 10, 10, 5, 10, 10))
mode <- function(codes){which.max(tabulate(codes))}
library(dplyr)
df1 %>%
group_by(Area, Year) %>%
mutate(is_modal = if_else(Gender == mode(Gender), TRUE, FALSE))