Home > front end >  Creating a column based on other columns in R
Creating a column based on other columns in R

Time:03-25

I`m very new to R but I would like to create a new column in my data frame based on 2 other columns (date columns). I have a grouping variable coded (1, 2, 3) and 2 date columns. Below are the conditions for my new column:

  • If grouping column == 1, then it should bring back the corresponding date (row-wise) from the date1 column
  • If grouping column == 2, then it should bring back the date from the date2 column
  • If grouping column == 3, then it should bring back the earliest/first date between the 2 date columns

I have tried case_when and if_else but have not had any success. Any assistance will be greatly appreciated.

I tried case-when and if_else but got errors

CodePudding user response:

Here is a way with case_when.

set.seed(2022)

d <- seq(as.Date("2022-01-01"), Sys.Date(), by = "day")
df1 <- data.frame(
  group = sample(3, 20, TRUE),
  date1 = sample(d, 20),
  date2 = sample(d, 20)
)

suppressPackageStartupMessages(library(dplyr))

df1 %>%
  mutate(tmpdate = pmin(date1, date2)) %>%
  mutate(newdate = case_when(
    group == 1 ~ date1,
    group == 2 ~ date2,
    TRUE ~ tmpdate
  )) %>%
  select(-tmpdate) %>% head
#>   group      date1      date2    newdate
#> 1     3 2022-01-29 2022-02-25 2022-01-29
#> 2     2 2022-01-12 2022-01-01 2022-01-01
#> 3     3 2022-03-20 2022-01-19 2022-01-19
#> 4     3 2022-03-01 2022-03-01 2022-03-01
#> 5     2 2022-02-13 2022-02-19 2022-02-19
#> 6     3 2022-03-11 2022-02-10 2022-02-10

Created on 2022-03-24 by the reprex package (v2.0.1)

CodePudding user response:

I'm not 100% sure if this is what you wanted in the case where the group variable is 3, but it adds a day to the earlier date.


df <- data.frame(group = c(1:3),
                 date1 = c(01-02-99, 01-02-99,1-02-99),
                 date2 = c(04-02-99, 04-02-99,04-02-99)
                 )

df$date1 <- as.Date(df$date1)
df$date2 <- as.Date(df$date2)


df <- df %>% mutate(col3 = case_when(
    df$group == 1 ~ df$date1,
    df$group == 2 ~ df$date2,
    df$group == 3 & df$date1 > df$date2 ~ df$date2 1,
    df$group == 3 & df$date2 > df$date1 ~ df$date1 1
))


  • Related