Home > OS >  removing duplicates by subject ID
removing duplicates by subject ID

Time:07-27

I have a data frame like so:

subject <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5)
day <- c(20, 20, 20 , 20, 20, 40 , 40 , 40 , 40 , 50, 50, 50, 40, 40, 40, 40, 20, 20)

ex <- data.frame(subject, day)

start

I want to change duplicates to NA in the day column but only for each subject.

my desired output looks like this: end

Any help would be much apricated! Must be done in R

CodePudding user response:

library(dplyr)
ex %>%
  group_by(subject) %>%
  mutate(day = ifelse(duplicated(day), NA, day)) %>%
  ungroup()
# # A tibble: 18 × 2
#    subject   day
#      <dbl> <dbl>
#  1       1    20
#  2       1    NA
#  3       1    NA
#  4       1    NA
#  5       1    NA
#  6       2    40
#  7       2    NA
#  8       2    NA
#  9       2    NA
# 10       3    50
# 11       3    NA
# 12       3    NA
# 13       4    40
# 14       4    NA
# 15       4    NA
# 16       4    NA
# 17       5    20
# 18       5    NA

CodePudding user response:

library(dplyr)

ex %>% 
  group_by(subject) %>% 
  mutate(day = ifelse(row_number()==1, day, NA_real_)) %>% 
  ungroup()

  subject   day
     <dbl> <dbl>
 1       1    20
 2       1    NA
 3       1    NA
 4       1    NA
 5       1    NA
 6       2    40
 7       2    NA
 8       2    NA
 9       2    NA
10       3    50
11       3    NA
12       3    NA
13       4    40
14       4    NA
15       4    NA
16       4    NA
17       5    20
18       5    NA

CodePudding user response:

We may use

ex$day <- NA^duplicated(ex) * ex$day

-output

> ex
   subject day
1        1  20
2        1  NA
3        1  NA
4        1  NA
5        1  NA
6        2  40
7        2  NA
8        2  NA
9        2  NA
10       3  50
11       3  NA
12       3  NA
13       4  40
14       4  NA
15       4  NA
16       4  NA
17       5  20
18       5  NA
  •  Tags:  
  • r
  • Related