I wanna create a new column based on the data in column 2 - 4, conditioning if the date column is before or after the previous row. My data looks as follows:
date city1 city2 city3
2022-01-25 Paris London Berlin
2022-01-28 Paris London Berlin
2022-02-04 Paris London Berlin
2022-01-26 Paris London Berlin
2022-02-08 Paris London Berlin
2022-02-02 Paris London Berlin
2022-02-04 Paris London Berlin
2022-02-06 Paris London Berlin
The expected output looks as follow:
date city1 city2 city3 NewColumn
2022-01-25 Paris London Berlin Paris
2022-01-28 Paris London Berlin Paris
2022-02-04 Paris London Berlin Paris
2022-01-26 Paris London Berlin London
2022-02-08 Paris London Berlin London
2022-02-02 Paris London Berlin Berlin
2022-02-04 Paris London Berlin Berlin
2022-02-06 Paris London Berlin Berlin
This is what I have tried:
for (i in 2:4){
new_data <- data %>% mutate(NewColumn = ifelse(
as.Date(date) > lag(as.Date(date)), data[,i], data[,i 1]))}
But this only generates:
date city1 city2 city3 NewColumn
2022-01-25 Paris London Berlin <NA>
2022-01-28 Paris London Berlin Berlin
2022-02-04 Paris London Berlin Berlin
2022-01-26 Paris London Berlin London
2022-02-08 Paris London Berlin London
2022-02-02 Paris London Berlin Berlin
2022-02-04 Paris London Berlin Berlin
2022-02-06 Paris London Berlin Berlin
How can I solve this? Any advice?
CodePudding user response:
Using dplyr. Creating a custom group, we can use it to mutate the new values into the column
library(dplyr)
df |>
group_by(grp = cumsum(c(1, diff(lubridate::ymd(date))) < 0) 1) |>
rowwise() |>
mutate(NewColumn = c_across(city1:city3)[grp]) |>
ungroup() |> select(-grp)
date city1 city2 city3 NewColumn
<chr> <chr> <chr> <chr> <chr>
1 2022-01-25 Paris London Berlin Paris
2 2022-01-28 Paris London Berlin Paris
3 2022-02-04 Paris London Berlin Paris
4 2022-01-26 Paris London Berlin London
5 2022-02-08 Paris London Berlin London
6 2022-02-02 Paris London Berlin Berlin
7 2022-02-04 Paris London Berlin Berlin
8 2022-02-06 Paris London Berlin Berlin
CodePudding user response:
- You can try this
j <- 2
lag_d <- c(0 , df$date)
for(i in 1:(length(lag_d)-1)){
if(lag_d[i 1] > lag_d[i]) df$NewColumn[i] <- df[i,j]
else {
j <- j 1
df$NewColumn[i] <- df[i,j]
}
}
- Output
date city1 city2 city3 NewColumn
1 2022-01-25 Paris London Berlin Paris
2 2022-01-28 Paris London Berlin Paris
3 2022-02-04 Paris London Berlin Paris
4 2022-01-26 Paris London Berlin London
5 2022-02-08 Paris London Berlin London
6 2022-02-02 Paris London Berlin Berlin
7 2022-02-04 Paris London Berlin Berlin
8 2022-02-06 Paris London Berlin Berlin