I'm trying to create a column in my spreadsheet that takes the last recorded value (IC) for a specific individual (by the Datetime column) and populates it into a column (LIC) for the current event.
A sub-sample of my data looks like this (actual dataset has 4949 rows and 37 individuals):
> head(ACdatas.scale)
Date Datetime ID.2 IC LIC
1 2019-05-25 2019-05-25 11:57 139 High NA
2 2019-06-09 2019-06-09 19:42 139 Low NA
3 2019-07-05 2019-07-05 20:12 139 Medium NA
4 2019-07-27 2019-07-27 17:27 152 Low NA
5 2019-08-04 2019-08-04 9:13 152 Medium NA
6 2019-08-04 2019-08-04 16:18 139 Medium NA
I would like to be able to populate the last value from the IC column into the current LIC column for the current event (see below)
> head(ACdatas.scale)
Date Datetime ID.2 IC LIC
1 2019-05-25 2019-05-25 11:57 139 High NA
2 2019-06-09 2019-06-09 19:42 139 Low High
3 2019-07-05 2019-07-05 20:12 139 Medium Low
4 2019-07-27 2019-07-27 17:27 152 Low NA
5 2019-08-04 2019-08-04 9:13 152 Medium Low
6 2019-08-04 2019-08-04 16:18 139 Medium Medium
I've tried the following code:
ACdatas.scale <- ACdatas.scale %>%
arrange(ID.2, Datetime) %>%
group_by(ID.2) %>%
mutate(LIC= lag(IC))
This worked some of the time, but when I checked back through the data, it seemed to have a problem when the date switched, so it could accurately populate the field within the same day, but not when the previous event was on the previous day. Just to make it super confusing, it only had issues with some of the day switches, and not all! Help please!!
CodePudding user response:
Sample data,
dat <- data.frame(id=c(rep("A",5),rep("B",5)), IC=c(1:5,11:15))
dplyr
library(dplyr)
dat %>%
group_by(id) %>%
mutate(LIC = lag(IC)) %>%
ungroup()
# # A tibble: 10 x 3
# id IC LIC
# <chr> <int> <int>
# 1 A 1 NA
# 2 A 2 1
# 3 A 3 2
# 4 A 4 3
# 5 A 5 4
# 6 B 11 NA
# 7 B 12 11
# 8 B 13 12
# 9 B 14 13
# 10 B 15 14
data.table
library(data.table)
as.data.table(dat)[, LIC := shift(IC, type = "lag"), by = .(id)][]
# id IC LIC
# <char> <int> <int>
# 1: A 1 NA
# 2: A 2 1
# 3: A 3 2
# 4: A 4 3
# 5: A 5 4
# 6: B 11 NA
# 7: B 12 11
# 8: B 13 12
# 9: B 14 13
# 10: B 15 14
base R
dat$LIC <- ave(dat$IC, dat$id, FUN = function(z) c(NA, z[-length(z)]))
dat
# id IC LIC
# 1 A 1 NA
# 2 A 2 1
# 3 A 3 2
# 4 A 4 3
# 5 A 5 4
# 6 B 11 NA
# 7 B 12 11
# 8 B 13 12
# 9 B 14 13
# 10 B 15 14
CodePudding user response:
By using your data:
mydat <- structure(list(Date = structure(c(18041, 18056, 18082,
18104, 18112, 18112),
class = "Date"),
Datetime = structure(c(1558760220,1560084120,
1562332320, 1564223220,
1564884780, 1564910280),
class = c("POSIXct","POSIXt"),
tzone = ""),
ID.2 = c(139, 139, 139, 152, 152, 139),
IC = c("High", "Low", "Medium", "Low", "Medium", "Medium"),
LIC = c(NA, NA, NA, NA, NA, NA)), row.names = c(NA, -6L),
class = "data.frame")
mydat %>% arrange(Datetime) %>% group_by(ID.2) %>% mutate(LIC = lag(IC))
# A tibble: 6 x 5
# Groups: ID.2 [2]
Date Datetime ID.2 IC LIC
<date> <dttm> <dbl> <chr> <chr>
1 2019-05-25 2019-05-25 11:57:00 139 High NA
2 2019-06-09 2019-06-09 19:42:00 139 Low High
3 2019-07-05 2019-07-05 20:12:00 139 Medium Low
4 2019-07-27 2019-07-27 17:27:00 152 Low NA
5 2019-08-04 2019-08-04 09:13:00 152 Medium Low
6 2019-08-04 2019-08-04 16:18:00 139 Medium Medium