How to change column values based on duplication in another column R-CodePudding

My data looks like this:

data <- data.frame(grupoaih = c("09081997", "13122006", "09081997", "22031969"),
                       NMM_PROC_BR = c(1, 1, 0, 1),
                       NMM_CID = c(0, 1, 1, 0),
                       CPAV_PROC_BR = c(0, 0, 0, 1),
                       CPAV_CID = c(1, 1, 0, 1))

  grupoaih NMM_PROC_BR NMM_CID CPAV_PROC_BR CPAV_CID
1 09081997           1       0            0        1
2 13122006           1       1            0        1
3 09081997           0       1            0        0
4 22031969           1       0            1        1

How can I assign the value 1 when "grupoaih" is a duplicate so the other 4 variables get filled equally like this:

data2 <- data.frame(grupoaih = c("09081997", "13122006", "09081997", "22031969"),
                       NMM_PROC_BR = c(1, 1, 1, 1),
                       NMM_CID = c(1, 1, 1, 0),
                       CPAV_PROC_BR = c(0, 0, 0, 1),
                       CPAV_CID = c(1, 1, 1, 1))

  grupoaih NMM_PROC_BR NMM_CID CPAV_PROC_BR CPAV_CID
1 09081997           1       1            0        1
2 13122006           1       1            0        1
3 09081997           1       1            0        1
4 22031969           1       0            1        1

This only applies if grupoaih is duplicated and any of the 4 variables are filled with 1. If both are 0 in all variables, they stay as they are.

CodePudding user response：

You can use a group_by and then an n() to check if there are duplicates. . stands for the original value, and ~ indicates a formula.

library(dplyr)

data %>%
  group_by(grupoaih) %>%
  mutate(across(c("NMM_PROC_BR", "NMM_CID", "CPAV_CID"), ~ifelse(n() > 1, 1, .))) %>%
  ungroup()

# # A tibble: 4 × 5
#   grupoaih NMM_PROC_BR NMM_CID CPAV_PROC_BR CPAV_CID
#   <chr>          <dbl>   <dbl>        <dbl>    <dbl>
# 1 09081997           1       1            0        1
# 2 13122006           1       1            0        1
# 3 09081997           1       1            0        1
# 4 22031969           1       0            1        1

CodePudding user response：

It could work with max after grouping

library(dplyr)
data %>% 
   group_by(grupoaih) %>% 
   mutate(across(everything(), max)) %>%
   ungroup

-output

# A tibble: 4 × 5
  grupoaih NMM_PROC_BR NMM_CID CPAV_PROC_BR CPAV_CID
  <chr>          <dbl>   <dbl>        <dbl>    <dbl>
1 09081997           1       1            0        1
2 13122006           1       1            0        1
3 09081997           1       1            0        1
4 22031969           1       0            1        1

Or use fmax from collapse

library(collapse)
data[-1] <- fmax(data[-1], data$grupoaih, TRA = 1)