I have temporal data on different bears ('ID'), at different positions ('Position'; land or ice). Here is a simplified version with two individuals (A and B):
ID <- rep.int(c("A", "B"), times = c(10, 10))
Dates <- c(seq(as.Date("2011-06-11"), as.Date("2011-06-20"), by = "days"),
seq(as.Date("2011-05-27"), as.Date("2011-06-05"), by="days"))
Position <- c("Land", "Ice", "Land", "Land", "Ice", "Ice", "Land", "Land", "Land", "Land",
"Land", "Land", "Land", "Ice", "Ice", "Land", "Land", "Land", "Ice", "Ice")
data <- data.frame(ID, Dates, Position)
ID Dates Position
1 A 2011-06-11 Land
2 A 2011-06-12 Ice
3 A 2011-06-13 Land
4 A 2011-06-14 Land
5 A 2011-06-15 Ice
6 A 2011-06-16 Ice
7 A 2011-06-17 Land
8 A 2011-06-18 Land
9 A 2011-06-19 Land
10 A 2011-06-20 Land
11 B 2011-05-27 Land
12 B 2011-05-28 Land
13 B 2011-05-29 Land
14 B 2011-05-30 Ice
15 B 2011-05-31 Ice
16 B 2011-06-01 Land
17 B 2011-06-02 Land
18 B 2011-06-03 Land
19 B 2011-06-04 Ice
20 B 2011-06-05 Ice
I want to create a variable Arrival
, which indicates on-land arrival date for each bear. I defined on-land arrival as the date of the first row in runs of three consecutive Position
on "Land". This row should be set to "Arrival", and the other rows to NA
. This date must also occur after May 31st.
For this dataset, the arrival dates would look like this:
ID Dates Position Arrival
1 A 2011-06-11 Land NA
2 A 2011-06-12 Ice NA
3 A 2011-06-13 Land NA
4 A 2011-06-14 Land NA
5 A 2011-06-15 Ice NA
6 A 2011-06-16 Ice NA
7 A 2011-06-17 Land Arrival
8 A 2011-06-18 Land NA
9 A 2011-06-19 Land NA
10 A 2011-06-20 Land NA
11 B 2011-05-27 Land NA
12 B 2011-05-28 Land NA
13 B 2011-05-29 Land NA
14 B 2011-05-30 Ice NA
15 B 2011-05-31 Ice NA
16 B 2011-06-01 Land Arrival
17 B 2011-06-02 Land NA
18 B 2011-06-03 Land NA
19 B 2011-06-04 Ice NA
20 B 2011-06-05 Ice NA
Is there a way for me to do this in R, preferably using dplyr?
CodePudding user response:
We can use zoo::rollapply
for this.
dplyr
library(dplyr)
data %>%
group_by(ID) %>%
mutate(
Arrival = Dates > "2011-05-31" &
lag(Position != "Land", default = FALSE) &
zoo::rollapply(Position == "Land", 3, align = "left", FUN = all, partial = TRUE)
) %>%
ungroup()
# # A tibble: 20 × 4
# ID Dates Position Arrival
# <chr> <date> <chr> <lgl>
# 1 A 2011-06-11 Land FALSE
# 2 A 2011-06-12 Ice FALSE
# 3 A 2011-06-13 Land FALSE
# 4 A 2011-06-14 Land FALSE
# 5 A 2011-06-15 Ice FALSE
# 6 A 2011-06-16 Ice FALSE
# 7 A 2011-06-17 Land TRUE
# 8 A 2011-06-18 Land FALSE
# 9 A 2011-06-19 Land FALSE
# 10 A 2011-06-20 Land FALSE
# 11 B 2011-05-27 Land FALSE
# 12 B 2011-05-28 Land FALSE
# 13 B 2011-05-29 Land FALSE
# 14 B 2011-05-30 Ice FALSE
# 15 B 2011-05-31 Ice FALSE
# 16 B 2011-06-01 Land TRUE
# 17 B 2011-06-02 Land FALSE
# 18 B 2011-06-03 Land FALSE
# 19 B 2011-06-04 Ice FALSE
# 20 B 2011-06-05 Ice FALSE
base R
data$prevnotland <- ave(
data$Position != "Land", data$ID,
FUN = function(z) c(FALSE, z[-length(z)]))
data$Arrival <- data$prevnotland & ave(
data$Dates > "2011-05-31" & data$Position == "Land", data$ID,
FUN = function(z) zoo::rollapply(z, 3, FUN=all, align="left", partial=TRUE))
data
# ID Dates Position prevnotland Arrival
# 1 A 2011-06-11 Land FALSE FALSE
# 2 A 2011-06-12 Ice FALSE FALSE
# 3 A 2011-06-13 Land TRUE FALSE
# 4 A 2011-06-14 Land FALSE FALSE
# 5 A 2011-06-15 Ice FALSE FALSE
# 6 A 2011-06-16 Ice TRUE FALSE
# 7 A 2011-06-17 Land TRUE TRUE
# 8 A 2011-06-18 Land FALSE FALSE
# 9 A 2011-06-19 Land FALSE FALSE
# 10 A 2011-06-20 Land FALSE FALSE
# 11 B 2011-05-27 Land FALSE FALSE
# 12 B 2011-05-28 Land FALSE FALSE
# 13 B 2011-05-29 Land FALSE FALSE
# 14 B 2011-05-30 Ice FALSE FALSE
# 15 B 2011-05-31 Ice TRUE FALSE
# 16 B 2011-06-01 Land TRUE TRUE
# 17 B 2011-06-02 Land FALSE FALSE
# 18 B 2011-06-03 Land FALSE FALSE
# 19 B 2011-06-04 Ice FALSE FALSE
# 20 B 2011-06-05 Ice TRUE FALSE
CodePudding user response:
library(dplyr)
left_join(data,
data %>%
arrange(ID, Dates) %>% # if not in OP order already
group_by(ID, loc_grp = cumsum(Position != lag(Position, 1, ""))) %>%
filter(Dates >= as.Date("2011-05-31"), Position == "Land",
n() >= 3, row_number() == 1) %>%
ungroup() %>%
transmute(ID, Dates, Position, Arrival = "Arrival"))
Result
Joining with `by = join_by(ID, Dates, Position)`
ID Dates Position Arrival
1 A 2011-06-11 Land <NA>
2 A 2011-06-12 Ice <NA>
3 A 2011-06-13 Land <NA>
4 A 2011-06-14 Land <NA>
5 A 2011-06-15 Ice <NA>
6 A 2011-06-16 Ice <NA>
7 A 2011-06-17 Land Arrival
8 A 2011-06-18 Land <NA>
9 A 2011-06-19 Land <NA>
10 A 2011-06-20 Land <NA>
11 B 2011-05-27 Land <NA>
12 B 2011-05-28 Land <NA>
13 B 2011-05-29 Land <NA>
14 B 2011-05-30 Ice <NA>
15 B 2011-05-31 Ice <NA>
16 B 2011-06-01 Land Arrival
17 B 2011-06-02 Land <NA>
18 B 2011-06-03 Land <NA>
19 B 2011-06-04 Ice <NA>
20 B 2011-06-05 Ice <NA>
CodePudding user response:
Not as succinct as other solutions, but step-by-step with some temporary variables.
library(tidyverse)
ddf <- data |>
arrange(ID, Dates) |>
group_by(ID) |>
mutate(n = lead(Position, n = 1)) |>
mutate(nn = lead(Position, n = 2)) |>
filter(Position == n & Position == nn & Dates > "2011-05-30") |>
slice_head(n = 1) |>
select(-(n:nn)) |>
mutate(Arrival = "Arrival")
ddf |> right_join(data) |> arrange(ID, Dates)
#> Joining, by = c("ID", "Dates", "Position")
#> # A tibble: 20 × 4
#> # Groups: ID [2]
#> ID Dates Position Arrival
#> <chr> <date> <chr> <chr>
#> 1 A 2011-06-11 Land <NA>
#> 2 A 2011-06-12 Ice <NA>
#> 3 A 2011-06-13 Land <NA>
#> 4 A 2011-06-14 Land <NA>
#> 5 A 2011-06-15 Ice <NA>
#> 6 A 2011-06-16 Ice <NA>
#> 7 A 2011-06-17 Land Arrival
#> 8 A 2011-06-18 Land <NA>
#> 9 A 2011-06-19 Land <NA>
#> 10 A 2011-06-20 Land <NA>
#> 11 B 2011-05-27 Land <NA>
#> 12 B 2011-05-28 Land <NA>
#> 13 B 2011-05-29 Land <NA>
#> 14 B 2011-05-30 Ice <NA>
#> 15 B 2011-05-31 Ice <NA>
#> 16 B 2011-06-01 Land Arrival
#> 17 B 2011-06-02 Land <NA>
#> 18 B 2011-06-03 Land <NA>
#> 19 B 2011-06-04 Ice <NA>
#> 20 B 2011-06-05 Ice <NA>
CodePudding user response:
I hope that your preferably using dplyr means that you are still open for other possibilities :) If so, here's a data.table
alternative.
library(data.table)
setDT(data)
data[Dates > "2011-05-31",
Arrival := if(.N > 2 & Position[1] == "Land") c("Arrival", rep(NA, .N - 1)),
by = .(ID, rleid(Position))]
ID Dates Position Arrival
1: A 2011-06-11 Land <NA>
2: A 2011-06-12 Ice <NA>
3: A 2011-06-13 Land <NA>
4: A 2011-06-14 Land <NA>
5: A 2011-06-15 Ice <NA>
6: A 2011-06-16 Ice <NA>
7: A 2011-06-17 Land Arrival
8: A 2011-06-18 Land <NA>
9: A 2011-06-19 Land <NA>
10: A 2011-06-20 Land <NA>
11: B 2011-05-27 Land <NA>
12: B 2011-05-28 Land <NA>
13: B 2011-05-29 Land <NA>
14: B 2011-05-30 Ice <NA>
15: B 2011-05-31 Ice <NA>
16: B 2011-06-01 Land Arrival
17: B 2011-06-02 Land <NA>
18: B 2011-06-03 Land <NA>
19: B 2011-06-04 Ice <NA>
20: B 2011-06-05 Ice <NA>
Explanation:
Select relevant rows (Dates > "2011-05-31"
). Create groups by 'ID' and consecutive runs of 'Position' (by = .(ID, rleid(Position))
). Within each group, if
number of rows are more than 2 (.N > 2
) &
values in the run of Positions are "Land"
(Position[1] == "Land"
), create the result where the first value is "Arrival" and the rest (.N-1
) are NA
. Add the new column by reference (:=
).
CodePudding user response:
This dplyr
approach uses a relative (non-hardcoded) year for the date condition. Needs library(data.table)
for rleid
. Can be replaced but is very handy.
library(dplyr)
data %>%
group_by(ID) %>%
mutate(grp = data.table::rleid(Position)) %>%
group_by(ID, grp) %>%
mutate(Arrival = if_else(n() >= 3 & Position == "Land" & row_number() == 1 &
Dates > paste0(format(Dates, "%Y"), "-05-31"),
"Arrival", NA_character_)) %>%
ungroup() %>%
select(-grp)
# A tibble: 20 × 4
ID Dates Position Arrival
<chr> <date> <chr> <chr>
1 A 2011-06-11 Land NA
2 A 2011-06-12 Ice NA
3 A 2011-06-13 Land NA
4 A 2011-06-14 Land NA
5 A 2011-06-15 Ice NA
6 A 2011-06-16 Ice NA
7 A 2011-06-17 Land Arrival
8 A 2011-06-18 Land NA
9 A 2011-06-19 Land NA
10 A 2011-06-20 Land NA
11 B 2011-05-27 Land NA
12 B 2011-05-28 Land NA
13 B 2011-05-29 Land NA
14 B 2011-05-30 Ice NA
15 B 2011-05-31 Ice NA
16 B 2011-06-01 Land Arrival
17 B 2011-06-02 Land NA
18 B 2011-06-03 Land NA
19 B 2011-06-04 Ice NA
20 B 2011-06-05 Ice NA