I have the following example dataset showing whether a bear is on land or on the sea ice:
Bear.ID Region
1 A Land
2 A Land
3 A Land
4 A Ice
5 A Ice
6 A Ice
7 A Ice
8 B Land
9 B Ice
10 B Land
11 B Land
12 B Ice
13 B Ice
14 B Ice
15 B Ice
My goal is to create another column showing each bear's departure onto the sea ice. This departure date is defined as the first row where the bear is on the ice that is not followed by a row where the bear is on land.
So in my example dataset, the column would look like this:
Bear.ID Region Departure?
1 A Land Not Departure
2 A Land Not Departure
3 A Land Not Departure
4 A Ice Departure
5 A Ice Not Departure
6 A Ice Not Departure
7 A Ice Not Departure
8 B Land Not Departure
9 B Ice Not Departure
10 B Land Not Departure
11 B Land Not Departure
12 B Ice Departure
13 B Ice Not Departure
14 B Ice Not Departure
15 B Ice Not Departure
How can I do this in R? "Departure" and "Not Departure" can also be replaced by TRUE and FALSE if this makes it easier.
CodePudding user response:
I have generated a sample dataframe.
library(data.table)
library(tidyverse)
id <-c(1, 1, 1, 2, 2)
date <- c("2022-10-01 22:22:01","2022-11-05 22:22:01","2022-08-18 12:48:16","2022-11-19 20:57:44","2022-12-19 20:57:44")
date_1 <- c("2022-11-01 22:22:01","2022-11-02 22:22:01","2022-11-03 12:48:16","2022-11-04 20:57:44","2022-11-05 20:57:44")
date_2 <- c("2022-12-01 22:22:01","2022-12-02 22:22:01","2022-12-03 12:48:16","2022-12-04 20:57:44","2022-12-05 20:57:44")
df <- data.table(id,date,date_1, date_2)
In this step i format all the date columns.
#### format date
df$date <- as.POSIXct(df$date)
df$date_1 <- as.POSIXct(df$date_1)
df$date_2 <- as.POSIXct(df$date_2)
In this step i check if the date column is in the time range.
#### check if date is in time range
df$Departure_Date <- ifelse(df$date <= df$date_2 & df$date >= df$date_1, "Departure", "Not Departure")
The result looks like this:
id date date_1 date_2 Departure.Date
1: 1 2022-10-01 22:22:01 2022-11-01 22:22:01 2022-12-01 22:22:01 Not Departure
2: 1 2022-11-05 22:22:01 2022-11-02 22:22:01 2022-12-02 22:22:01 Departure
3: 1 2022-08-18 12:48:16 2022-11-03 12:48:16 2022-12-03 12:48:16 Not Departure
4: 2 2022-11-19 20:57:44 2022-11-04 20:57:44 2022-12-04 20:57:44 Departure
5: 2 2022-12-19 20:57:44 2022-11-05 20:57:44 2022-12-05 20:57:44 Not Departure
CodePudding user response:
So I think this might help you, I am checking if my current land is 'ice', the one before is 'land' and the one after is also 'ice'
library(dplyr)
data.frame(
bear = c("A","A","A","B","B","B","B","B"),
region = c("Land","Ice","Ice","Land","Ice","Land","Ice","Ice")
) %>%
group_by(bear) %>%
mutate(
departure = case_when(
region == 'Ice' & lag(region) == "Land" & lead(region) == "Ice" ~ "Departure",
TRUE ~ "Not departure"
)
)
# A tibble: 8 x 3
# Groups: bear [2]
bear region departure
<chr> <chr> <chr>
1 A Land Not departure
2 A Ice Departure
3 A Ice Not departure
4 B Land Not departure
5 B Ice Not departure
6 B Land Not departure
7 B Ice Departure
8 B Ice Not departure