I have a dataset contains 3 different vars like this:
id gender phase
a1 m 1
a1 m 2
a1 m 3
b2 m 1
b2 f 2
b2 m 3
c3 f 1
c3 f 2
c3 f 3
...
Notice that for id==b2, phase==2, the gender is accidentally marked as "f", it should be consistent with other phases as gender=="m" because the gender cannot be changed during the study phases.So if I want to run a R code to detect which ids have such issue, how should I accomplish that goal? Thanks a lot~~
CodePudding user response:
With dplyr
, you could detect which ids have more than one genders with n_distinct()
.
library(dplyr)
df %>%
group_by(id) %>%
filter(n_distinct(gender) > 1) %>%
ungroup()
# # A tibble: 3 × 3
# id gender phase
# <chr> <chr> <int>
# 1 b2 m 1
# 2 b2 f 2
# 3 b2 m 3
CodePudding user response:
You can use lag
to check if the value changed in the column and filter
the id that have a change like this:
df <- read.table(text="id gender phase
a1 m 1
a1 m 2
a1 m 3
b2 m 1
b2 f 2
b2 m 3
c3 f 1
c3 f 2
c3 f 3", header = TRUE)
library(dplyr)
df %>%
group_by(id) %>%
filter(any(gender != lag(gender)))
#> # A tibble: 3 × 3
#> # Groups: id [1]
#> id gender phase
#> <chr> <chr> <int>
#> 1 b2 m 1
#> 2 b2 f 2
#> 3 b2 m 3
Created on 2022-07-13 by the reprex package (v2.0.1)
CodePudding user response:
id<-c("a1","a1","a1","b2","b2","b2","c3","c3","c3")
gender<-c("m","m","m","m","f","m","f","f","f")
phase<-c(1,2,3,1,2,3,1,2,3)
mydata<-data.frame(id,gender,phase)
mydata[mydata$id%in%c("a1","b2"),"gender"]<-"m"
mydata[mydata$id%in%c("c3"),"gender"]<-"f"
mydata