Home > Net >  Create new column based on other columns and date conditioning
Create new column based on other columns and date conditioning

Time:08-14

I wanna create a new column based on the data in column 2 - 4, conditioning if the date column is before or after the previous row. My data looks as follows:

date        city1  city2   city3    
2022-01-25  Paris  London  Berlin  
2022-01-28  Paris  London  Berlin  
2022-02-04  Paris  London  Berlin  
2022-01-26  Paris  London  Berlin  
2022-02-08  Paris  London  Berlin  
2022-02-02  Paris  London  Berlin  
2022-02-04  Paris  London  Berlin  
2022-02-06  Paris  London  Berlin  

The expected output looks as follow:

date        city1  city2   city3   NewColumn 
2022-01-25  Paris  London  Berlin  Paris
2022-01-28  Paris  London  Berlin  Paris
2022-02-04  Paris  London  Berlin  Paris
2022-01-26  Paris  London  Berlin  London
2022-02-08  Paris  London  Berlin  London
2022-02-02  Paris  London  Berlin  Berlin
2022-02-04  Paris  London  Berlin  Berlin
2022-02-06  Paris  London  Berlin  Berlin

This is what I have tried:

for (i in 2:4){
new_data <- data %>% mutate(NewColumn = ifelse(
    as.Date(date) > lag(as.Date(date)), data[,i], data[,i 1]))}

But this only generates:

date        city1  city2   city3   NewColumn 
2022-01-25  Paris  London  Berlin  <NA>
2022-01-28  Paris  London  Berlin  Berlin
2022-02-04  Paris  London  Berlin  Berlin
2022-01-26  Paris  London  Berlin  London
2022-02-08  Paris  London  Berlin  London
2022-02-02  Paris  London  Berlin  Berlin
2022-02-04  Paris  London  Berlin  Berlin
2022-02-06  Paris  London  Berlin  Berlin

How can I solve this? Any advice?

CodePudding user response:

Using dplyr. Creating a custom group, we can use it to mutate the new values into the column

library(dplyr)

df |>
      group_by(grp = cumsum(c(1, diff(lubridate::ymd(date))) < 0)   1) |>
      rowwise() |> 
      mutate(NewColumn = c_across(city1:city3)[grp]) |> 
      ungroup() |> select(-grp)
  date       city1 city2  city3  NewColumn
  <chr>      <chr> <chr>  <chr>  <chr>    
1 2022-01-25 Paris London Berlin Paris    
2 2022-01-28 Paris London Berlin Paris    
3 2022-02-04 Paris London Berlin Paris    
4 2022-01-26 Paris London Berlin London   
5 2022-02-08 Paris London Berlin London   
6 2022-02-02 Paris London Berlin Berlin   
7 2022-02-04 Paris London Berlin Berlin   
8 2022-02-06 Paris London Berlin Berlin

CodePudding user response:

  • You can try this
j <- 2
lag_d <- c(0 , df$date)
for(i in 1:(length(lag_d)-1)){
    if(lag_d[i 1] > lag_d[i]) df$NewColumn[i] <- df[i,j]
    else {
        j <- j   1
        df$NewColumn[i] <- df[i,j]
    }
}
  • Output
        date city1  city2  city3 NewColumn
1 2022-01-25 Paris London Berlin     Paris
2 2022-01-28 Paris London Berlin     Paris
3 2022-02-04 Paris London Berlin     Paris
4 2022-01-26 Paris London Berlin    London
5 2022-02-08 Paris London Berlin    London
6 2022-02-02 Paris London Berlin    Berlin
7 2022-02-04 Paris London Berlin    Berlin
8 2022-02-06 Paris London Berlin    Berlin
  •  Tags:  
  • r
  • Related