Home > Mobile >  Dplyr::last - rows dropped if one variable can't be computed in summarise
Dplyr::last - rows dropped if one variable can't be computed in summarise

Time:02-22

I have a data frame I want to summarise, for some of the groups, some variables should return NA, but instead the whole row is removed. Toy example Df:

df=data.frame(button=c(1,2,3,3,3,2),group=c(1,1,1,2,2,2),RT=c(100,110,120,130,140,150))

When I summarise without using "last" I get as expected:

df%>%dplyr::group_by(group) %>%dplyr::summarize(RT=mean(RT), RT.button1=mean(RT[button==1]))
# A tibble: 2 x 3
  group    RT RT.button1
* <dbl> <dbl>      <dbl>
1     1   110        110
2     2   140        NaN

But when I use last, instead the row is removed

df%>%dplyr::group_by(group) %>%dplyr::summarize(RT=mean(RT), RT.button1=mean(RT[button==1]),RT.last.button1=last(RT[button==1]))
# A tibble: 1 x 4
# Groups:   group [1]
  group    RT RT.button1 RT.last.button1
  <dbl> <dbl>      <dbl>           <dbl>
1     1   110        110             110

Is there any way to get "last" to return NA instead of removing the row? I'd be very grateful for any pointers!

CodePudding user response:

This is certainly because you are using data.table::last instead of dplyr::last.

With data.table::last:

df %>% 
  dplyr::group_by(group) %>% 
  dplyr::summarize(RT = mean(RT), 
                   RT.button1 = mean(RT[button == 1]),
                   RT.last.button1 = data.table::last(RT[button == 1]))

# Groups:   group [1]
  group    RT RT.button1 RT.last.button1
  <dbl> <dbl>      <dbl>           <dbl>
1     1   110        110             110

With dplyr::last:

df %>% 
  dplyr::group_by(group) %>% 
  dplyr::summarize(RT = mean(RT), 
                   RT.button1 = mean(RT[button == 1]),
                   RT.last.button1 = dplyr::last(RT[button == 1]))
# A tibble: 2 x 4
  group    RT RT.button1 RT.last.button1
  <dbl> <dbl>      <dbl>           <dbl>
1     1   110        110             110
2     2   140        NaN              NA
  • Related