Home > other >  Using lag function to find the last value for a specific individual
Using lag function to find the last value for a specific individual

Time:04-19

I'm trying to create a column in my spreadsheet that takes the last recorded value (IC) for a specific individual (by the Datetime column) and populates it into a column (LIC) for the current event.

A sub-sample of my data looks like this (actual dataset has 4949 rows and 37 individuals):

> head(ACdatas.scale)
    Date         Datetime     ID.2      IC       LIC

1 2019-05-25 2019-05-25 11:57  139     High      NA
2 2019-06-09 2019-06-09 19:42  139     Low       NA
3 2019-07-05 2019-07-05 20:12  139     Medium    NA
4 2019-07-27 2019-07-27 17:27  152     Low       NA
5 2019-08-04  2019-08-04 9:13  152     Medium    NA
6 2019-08-04 2019-08-04 16:18  139     Medium    NA

I would like to be able to populate the last value from the IC column into the current LIC column for the current event (see below)

    > head(ACdatas.scale)
   Date         Datetime     ID.2      IC       LIC

1 2019-05-25 2019-05-25 11:57  139     High      NA
2 2019-06-09 2019-06-09 19:42  139     Low       High
3 2019-07-05 2019-07-05 20:12  139     Medium    Low
4 2019-07-27 2019-07-27 17:27  152     Low       NA
5 2019-08-04  2019-08-04 9:13  152     Medium    Low
6 2019-08-04 2019-08-04 16:18  139     Medium    Medium

I've tried the following code:

ACdatas.scale <- ACdatas.scale %>%
  arrange(ID.2, Datetime) %>% 
  group_by(ID.2) %>%
  mutate(LIC= lag(IC))

This worked some of the time, but when I checked back through the data, it seemed to have a problem when the date switched, so it could accurately populate the field within the same day, but not when the previous event was on the previous day. Just to make it super confusing, it only had issues with some of the day switches, and not all! Help please!!

CodePudding user response:

Sample data,

dat <- data.frame(id=c(rep("A",5),rep("B",5)), IC=c(1:5,11:15))

dplyr

library(dplyr)
dat %>%
  group_by(id) %>%
  mutate(LIC = lag(IC)) %>%
  ungroup()
# # A tibble: 10 x 3
#    id       IC   LIC
#    <chr> <int> <int>
#  1 A         1    NA
#  2 A         2     1
#  3 A         3     2
#  4 A         4     3
#  5 A         5     4
#  6 B        11    NA
#  7 B        12    11
#  8 B        13    12
#  9 B        14    13
# 10 B        15    14

data.table

library(data.table)
as.data.table(dat)[, LIC := shift(IC, type = "lag"), by = .(id)][]
#         id    IC   LIC
#     <char> <int> <int>
#  1:      A     1    NA
#  2:      A     2     1
#  3:      A     3     2
#  4:      A     4     3
#  5:      A     5     4
#  6:      B    11    NA
#  7:      B    12    11
#  8:      B    13    12
#  9:      B    14    13
# 10:      B    15    14

base R

dat$LIC <- ave(dat$IC, dat$id, FUN = function(z) c(NA, z[-length(z)]))
dat
#    id IC LIC
# 1   A  1  NA
# 2   A  2   1
# 3   A  3   2
# 4   A  4   3
# 5   A  5   4
# 6   B 11  NA
# 7   B 12  11
# 8   B 13  12
# 9   B 14  13
# 10  B 15  14

CodePudding user response:

By using your data:

mydat <- structure(list(Date = structure(c(18041, 18056, 18082,
                                           18104, 18112, 18112),
                                           class = "Date"), 
                        Datetime = structure(c(1558760220,1560084120,
                                   1562332320, 1564223220, 
                                   1564884780, 1564910280), 
                                   class = c("POSIXct","POSIXt"), 
                                   tzone = ""), 
                        ID.2 = c(139, 139, 139, 152, 152, 139),
                        IC = c("High", "Low", "Medium", "Low",  "Medium", "Medium"),
                        LIC = c(NA, NA, NA, NA, NA, NA)), row.names = c(NA, -6L), 
                               class = "data.frame")

mydat %>% arrange(Datetime) %>% group_by(ID.2) %>% mutate(LIC = lag(IC))
# A tibble: 6 x 5
# Groups:   ID.2 [2]
  Date       Datetime             ID.2 IC     LIC   
  <date>     <dttm>              <dbl> <chr>  <chr> 
1 2019-05-25 2019-05-25 11:57:00   139 High   NA    
2 2019-06-09 2019-06-09 19:42:00   139 Low    High  
3 2019-07-05 2019-07-05 20:12:00   139 Medium Low   
4 2019-07-27 2019-07-27 17:27:00   152 Low    NA    
5 2019-08-04 2019-08-04 09:13:00   152 Medium Low   
6 2019-08-04 2019-08-04 16:18:00   139 Medium Medium

  • Related