I have a dataframe like this one:
df
ID job_code
1 8
1 8
1 8
2 7
2 7
2 4
3 1
3 2
If an individual has several times the same job code, I would like to keep only the first one and replace the others by 0, to obtain a dataframe like this one:
df
ID job_code job_code_2
1 8 8
1 8 0
1 8 0
2 7 7
2 7 0
2 4 4
3 1 1
3 2 2
I thought of using function :
dataframe %>%
group_by(ID) %>%
and replace
but I am not sure how.
Thank you in advance for your help.
CodePudding user response:
Use duplicated
:
df %>%
group_by(ID) %>%
mutate(job_code2 = ifelse(duplicated(job_code), 0, job_code)) %>%
ungroup()
in base R you can use tapply
duplicated
:
df$job_code2 <- unlist(tapply(df$job_code, df$ID, function(x) ifelse(duplicated(x), 0, x)))
CodePudding user response:
the first function is good but I don't know why there are some subjects where it doesn't work. For subjects where there is already a code that has been released for a previous subject it doesn't work for example, for subject 4 I get a 0 when I should get an 8
I have this :
ID job_code job_code_2 1 8 8 1 8 0 1 8 0 2 7 7 2 7 0 2 4 4 3 1 1 3 2 2 4 8 0
Instead of this :
ID job_code job_code_2 1 8 8 1 8 0 1 8 0 2 7 7 2 7 0 2 4 4 3 1 1 3 2 2 4 8 8
CodePudding user response:
library(tidyverse)
df <- data.frame(
ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L),
job_code = c(8L, 8L, 8L, 7L, 7L, 4L, 1L, 2L)
)
df %>%
group_by(ID, job_code) %>%
mutate(job_code2 = job_code * (row_number() == 1)) %>%
ungroup()
#> # A tibble: 8 x 3
#> ID job_code job_code2
#> <int> <int> <int>
#> 1 1 8 8
#> 2 1 8 0
#> 3 1 8 0
#> 4 2 7 7
#> 5 2 7 0
#> 6 2 4 4
#> 7 3 1 1
#> 8 3 2 2
Created on 2022-03-23 by the reprex package (v2.0.1)
CodePudding user response:
Another possible solution:
library(tidyverse)
df <- read_table("ID job_code
1 8
1 8
1 8
2 7
2 7
2 4
3 1
3 2")
df %>%
group_by(ID, job_code) %>%
mutate(job_code = if_else(row_number() > 1, 0, job_code)) %>%
ungroup
#> # A tibble: 8 x 2
#> ID job_code
#> <dbl> <dbl>
#> 1 1 8
#> 2 1 0
#> 3 1 0
#> 4 2 7
#> 5 2 0
#> 6 2 4
#> 7 3 1
#> 8 3 2