Creating new column using first value not followed by other values in other column-CodePudding

I have the following example dataset showing whether a bear is on land or on the sea ice:

     Bear.ID Region
1        A   Land
2        A   Land
3        A   Land
4        A    Ice
5        A    Ice
6        A    Ice
7        A    Ice
8        B   Land
9        B    Ice
10       B   Land
11       B   Land
12       B    Ice
13       B    Ice
14       B    Ice
15       B    Ice

My goal is to create another column showing each bear's departure onto the sea ice. This departure date is defined as the first row where the bear is on the ice that is not followed by a row where the bear is on land.

So in my example dataset, the column would look like this:

       Bear.ID Region Departure?
1        A   Land     Not Departure
2        A   Land     Not Departure
3        A   Land     Not Departure
4        A    Ice     Departure
5        A    Ice     Not Departure
6        A    Ice     Not Departure
7        A    Ice     Not Departure
8        B   Land     Not Departure
9        B    Ice     Not Departure
10       B   Land     Not Departure
11       B   Land     Not Departure
12       B    Ice     Departure
13       B    Ice     Not Departure
14       B    Ice     Not Departure
15       B    Ice     Not Departure

How can I do this in R? "Departure" and "Not Departure" can also be replaced by TRUE and FALSE if this makes it easier.

CodePudding user response：

I have generated a sample dataframe.

library(data.table)
library(tidyverse)

id <-c(1, 1, 1, 2, 2)
date <- c("2022-10-01 22:22:01","2022-11-05 22:22:01","2022-08-18 12:48:16","2022-11-19 20:57:44","2022-12-19 20:57:44")
date_1 <- c("2022-11-01 22:22:01","2022-11-02 22:22:01","2022-11-03 12:48:16","2022-11-04 20:57:44","2022-11-05 20:57:44")
date_2 <- c("2022-12-01 22:22:01","2022-12-02 22:22:01","2022-12-03 12:48:16","2022-12-04 20:57:44","2022-12-05 20:57:44")

df <- data.table(id,date,date_1, date_2)

In this step i format all the date columns.

#### format date
df$date <- as.POSIXct(df$date)
df$date_1 <- as.POSIXct(df$date_1)
df$date_2 <- as.POSIXct(df$date_2)

In this step i check if the date column is in the time range.

#### check if date is in time range
df$Departure_Date <- ifelse(df$date <= df$date_2 & df$date >= df$date_1, "Departure", "Not Departure")

The result looks like this:

   id                date              date_1              date_2 Departure.Date
1:  1 2022-10-01 22:22:01 2022-11-01 22:22:01 2022-12-01 22:22:01  Not Departure
2:  1 2022-11-05 22:22:01 2022-11-02 22:22:01 2022-12-02 22:22:01      Departure
3:  1 2022-08-18 12:48:16 2022-11-03 12:48:16 2022-12-03 12:48:16  Not Departure
4:  2 2022-11-19 20:57:44 2022-11-04 20:57:44 2022-12-04 20:57:44      Departure
5:  2 2022-12-19 20:57:44 2022-11-05 20:57:44 2022-12-05 20:57:44  Not Departure

CodePudding user response：

So I think this might help you, I am checking if my current land is 'ice', the one before is 'land' and the one after is also 'ice'

library(dplyr)

data.frame(
  bear = c("A","A","A","B","B","B","B","B"),
  region = c("Land","Ice","Ice","Land","Ice","Land","Ice","Ice")
) %>% 
  group_by(bear) %>% 
  mutate(
    departure = case_when(
      region == 'Ice' & lag(region) == "Land" & lead(region) == "Ice" ~ "Departure",
      TRUE ~ "Not departure"
    )
  )

# A tibble: 8 x 3
# Groups:   bear [2]
  bear  region departure    
  <chr> <chr>  <chr>        
1 A     Land   Not departure
2 A     Ice    Departure    
3 A     Ice    Not departure
4 B     Land   Not departure
5 B     Ice    Not departure
6 B     Land   Not departure
7 B     Ice    Departure    
8 B     Ice    Not departure