I have a data frame I want to summarise, for some of the groups, some variables should return NA, but instead the whole row is removed. Toy example Df:
df=data.frame(button=c(1,2,3,3,3,2),group=c(1,1,1,2,2,2),RT=c(100,110,120,130,140,150))
When I summarise without using "last" I get as expected:
df%>%dplyr::group_by(group) %>%dplyr::summarize(RT=mean(RT), RT.button1=mean(RT[button==1]))
# A tibble: 2 x 3
group RT RT.button1
* <dbl> <dbl> <dbl>
1 1 110 110
2 2 140 NaN
But when I use last, instead the row is removed
df%>%dplyr::group_by(group) %>%dplyr::summarize(RT=mean(RT), RT.button1=mean(RT[button==1]),RT.last.button1=last(RT[button==1]))
# A tibble: 1 x 4
# Groups: group [1]
group RT RT.button1 RT.last.button1
<dbl> <dbl> <dbl> <dbl>
1 1 110 110 110
Is there any way to get "last" to return NA instead of removing the row? I'd be very grateful for any pointers!
CodePudding user response:
This is certainly because you are using data.table::last
instead of dplyr::last
.
With data.table::last
:
df %>%
dplyr::group_by(group) %>%
dplyr::summarize(RT = mean(RT),
RT.button1 = mean(RT[button == 1]),
RT.last.button1 = data.table::last(RT[button == 1]))
# Groups: group [1]
group RT RT.button1 RT.last.button1
<dbl> <dbl> <dbl> <dbl>
1 1 110 110 110
With dplyr::last
:
df %>%
dplyr::group_by(group) %>%
dplyr::summarize(RT = mean(RT),
RT.button1 = mean(RT[button == 1]),
RT.last.button1 = dplyr::last(RT[button == 1]))
# A tibble: 2 x 4
group RT RT.button1 RT.last.button1
<dbl> <dbl> <dbl> <dbl>
1 1 110 110 110
2 2 140 NaN NA