R - Creating a variable which indicates if individual is in modal category, by two groups-CodePudding

I have a dataset (df_1) in which I have a large number of individual observations, from different areas, in different years.

I want to create a variable which indicates if an individual is in the modal category (say has the most common gender) for their area/year, 0 otherwise.

I can create a second dataset (df_2) of the modal gender for each area/year, How would I then create the variable in df_1 indicating if an individual is in the modal category?

Example:

df1 <- data.frame(
  ID = c(1:10),
  Area = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
  Year = c(2020, 2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021),
  Gender = c(5, 5, 5, 10, 5, 10, 10, 5, 10, 10))


# Create mode function
mode <- function(codes){which.max(tabulate(codes))}

groups <- df1 %>% group_by(Area, Year) %>%  summarise(mode = mode(Gender)))

I then want to add a column to df1 indicating if the individual is in the modal category for their survevy year.

CodePudding user response：

One way is to count by the desired groupings and then check if the value for the group equals the max value. e.g. for Gender by Area and Year:

library(dplyr)

df1 %>% 
  group_by(Area, Year) %>%
  add_count(Gender) %>% 
  mutate(modal = n == max(n)) %>%
  ungroup()

# A tibble: 10 × 6
      ID  Area  Year Gender     n modal
   <int> <dbl> <dbl>  <dbl> <int> <lgl>
 1     1     1  2020      5     2 TRUE 
 2     2     1  2020      5     2 TRUE 
 3     3     1  2021      5     2 TRUE 
 4     4     1  2021     10     1 FALSE
 5     5     1  2021      5     2 TRUE 
 6     6     2  2020     10     2 TRUE 
 7     7     2  2020     10     2 TRUE 
 8     8     2  2020      5     1 FALSE
 9     9     2  2021     10     2 TRUE 
10    10     2  2021     10     2 TRUE

CodePudding user response：

df1 <- data.frame(
  ID = c(1:10),
  Area = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
  Year = c(2020, 2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021),
  Gender = c(5, 5, 5, 10, 5, 10, 10, 5, 10, 10))

mode <- function(codes){which.max(tabulate(codes))}

library(dplyr)

df1 %>%
  group_by(Area, Year) %>% 
  mutate(is_modal = if_else(Gender == mode(Gender), TRUE, FALSE))