I have a data frame like
mydata <- data.frame(Id=c(01,01,01,01,01,01,02,02,02,02),
VISIT=c("Screeing","Baseline","Baseline","Baseline","Week 9","Week 9","Baseline","Week 2",
"Week 2","Week 2"),
Score=c(1,2,4,5,78,9,5,NA,3,4))
> mydata
Id VISIT Score
1 1 Screeing 1
2 1 Baseline 2
3 1 Baseline 4
4 1 Baseline 5
5 1 Week 9 78
6 1 Week 9 9
7 2 Baseline 5
8 2 Week 2 NA
9 2 Week 2 3
10 2 Week 2 4
What I am trying to do is to group by Id and VISIT and choose the first non NA value of each group as
> mydata
Id VISIT Score
<dbl> <fct> <dbl>
1 1 Screeing 1
2 1 Baseline 2
5 1 Week 9 78
7 2 Baseline 5
9 2 Week 2 3
This came to my mind
mydata<-mydata %>%
group_by(Id,VISIT) %>%
mutate(first = dplyr::first(na.omit(Score)))
But it does not remove other rows, and it just create a new column with repeated values of first non NA of each group.
CodePudding user response:
If base R is ok try this. NA
s are omitted by default.
aggregate( Score ~ Id VISIT, mydata, function(x) x[1] )
Id VISIT Score
1 1 Baseline 2
2 2 Baseline 5
3 1 Screeing 1
4 2 Week 2 3
5 1 Week 9 78
CodePudding user response:
A dplyr
alternative. Assuming that by "first" you simply mean the first row, in the order given, by group.
Note that (Id, VISIT) in your example data gives 2 groups for Baseline
.
library(dplyr)
mydata %>%
group_by(Id, VISIT) %>%
filter(!is.na(Score)) %>%
slice(1) %>%
ungroup()
Result:
# A tibble: 5 x 3
Id VISIT Score
<dbl> <chr> <dbl>
1 1 Baseline 2
2 1 Screeing 1
3 1 Week 9 78
4 2 Baseline 5
5 2 Week 2 3