Home > Mobile >  replace with 0 duplicate variable according to ID
replace with 0 duplicate variable according to ID

Time:03-23

I have a dataframe like this one:

df
ID  job_code
1   8
1   8
1   8
2   7
2   7
2   4
3   1
3   2

If an individual has several times the same job code, I would like to keep only the first one and replace the others by 0, to obtain a dataframe like this one:

df
ID  job_code    job_code_2
1   8           8
1   8           0
1   8           0
2   7           7
2   7           0
2   4           4
3   1           1
3   2           2

I thought of using function :

dataframe %>% 
  group_by(ID) %>% 
  and replace 

but I am not sure how.

Thank you in advance for your help.

CodePudding user response:

Use duplicated:

df %>% 
  group_by(ID) %>% 
  mutate(job_code2 = ifelse(duplicated(job_code), 0, job_code)) %>%
  ungroup()

in base R you can use tapply duplicated:

df$job_code2 <- unlist(tapply(df$job_code, df$ID, function(x) ifelse(duplicated(x), 0, x)))

CodePudding user response:

the first function is good but I don't know why there are some subjects where it doesn't work. For subjects where there is already a code that has been released for a previous subject it doesn't work for example, for subject 4 I get a 0 when I should get an 8

I have this :

ID job_code job_code_2 1 8 8 1 8 0 1 8 0 2 7 7 2 7 0 2 4 4 3 1 1 3 2 2 4 8 0

Instead of this :

ID job_code job_code_2 1 8 8 1 8 0 1 8 0 2 7 7 2 7 0 2 4 4 3 1 1 3 2 2 4 8 8

CodePudding user response:

library(tidyverse)
df <- data.frame(
  ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L),
  job_code = c(8L, 8L, 8L, 7L, 7L, 4L, 1L, 2L)
)

df %>%
  group_by(ID, job_code) %>%
  mutate(job_code2 = job_code *  (row_number() == 1)) %>%
  ungroup()
#> # A tibble: 8 x 3
#>      ID job_code job_code2
#>   <int>    <int>     <int>
#> 1     1        8         8
#> 2     1        8         0
#> 3     1        8         0
#> 4     2        7         7
#> 5     2        7         0
#> 6     2        4         4
#> 7     3        1         1
#> 8     3        2         2

Created on 2022-03-23 by the reprex package (v2.0.1)

CodePudding user response:

Another possible solution:

library(tidyverse)

df <- read_table("ID  job_code
1   8
1   8
1   8
2   7
2   7
2   4
3   1
3   2")

df %>% 
  group_by(ID, job_code) %>% 
  mutate(job_code = if_else(row_number() > 1, 0, job_code)) %>% 
  ungroup

#> # A tibble: 8 x 2
#>      ID job_code
#>   <dbl>    <dbl>
#> 1     1        8
#> 2     1        0
#> 3     1        0
#> 4     2        7
#> 5     2        0
#> 6     2        4
#> 7     3        1
#> 8     3        2
  • Related