Home > Back-end >  R - Creating a variable which indicates if individual is in modal category, by two groups
R - Creating a variable which indicates if individual is in modal category, by two groups

Time:12-15

I have a dataset (df_1) in which I have a large number of individual observations, from different areas, in different years.

I want to create a variable which indicates if an individual is in the modal category (say has the most common gender) for their area/year, 0 otherwise.

I can create a second dataset (df_2) of the modal gender for each area/year, How would I then create the variable in df_1 indicating if an individual is in the modal category?

Example:

df1 <- data.frame(
  ID = c(1:10),
  Area = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
  Year = c(2020, 2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021),
  Gender = c(5, 5, 5, 10, 5, 10, 10, 5, 10, 10))


# Create mode function
mode <- function(codes){which.max(tabulate(codes))}

groups <- df1 %>% group_by(Area, Year) %>%  summarise(mode = mode(Gender)))

I then want to add a column to df1 indicating if the individual is in the modal category for their survevy year.

CodePudding user response:

One way is to count by the desired groupings and then check if the value for the group equals the max value. e.g. for Gender by Area and Year:

library(dplyr)

df1 %>% 
  group_by(Area, Year) %>%
  add_count(Gender) %>% 
  mutate(modal = n == max(n)) %>%
  ungroup()

# A tibble: 10 × 6
      ID  Area  Year Gender     n modal
   <int> <dbl> <dbl>  <dbl> <int> <lgl>
 1     1     1  2020      5     2 TRUE 
 2     2     1  2020      5     2 TRUE 
 3     3     1  2021      5     2 TRUE 
 4     4     1  2021     10     1 FALSE
 5     5     1  2021      5     2 TRUE 
 6     6     2  2020     10     2 TRUE 
 7     7     2  2020     10     2 TRUE 
 8     8     2  2020      5     1 FALSE
 9     9     2  2021     10     2 TRUE 
10    10     2  2021     10     2 TRUE 

CodePudding user response:

df1 <- data.frame(
  ID = c(1:10),
  Area = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
  Year = c(2020, 2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021),
  Gender = c(5, 5, 5, 10, 5, 10, 10, 5, 10, 10))

mode <- function(codes){which.max(tabulate(codes))}

library(dplyr)

df1 %>%
  group_by(Area, Year) %>% 
  mutate(is_modal = if_else(Gender == mode(Gender), TRUE, FALSE))
  • Related