Home > Software design >  How can I create a new column, based on date condition?
How can I create a new column, based on date condition?

Time:11-12

I have two dataframes. The first one has followed structure:

                   time transactID itemID order salesPrice        day
 1: 2018-01-01 00:01:56    2278968    450     1      17.42 2018-01-01
 2: 2018-01-01 00:01:56    2278968     83     1       5.19 2018-01-01
 3: 2018-01-01 00:07:11    2255797   7851     2      20.47 2018-01-01
 4: 2018-01-01 00:09:24    2278968    450     1      17.42 2018-01-01
 5: 2018-01-01 00:09:24    2278968     83     1       5.19 2018-01-01
 6: 2018-01-01 00:39:26    2257125   9375     1      31.02 2018-01-01
 7: 2018-01-01 00:51:59    2278968    450     1      17.42 2018-01-01
 8: 2018-01-01 00:51:59    2278968     83     1       5.19 2018-01-01
 9: 2018-01-01 00:51:59    2278968     19     1      77.64 2018-01-01
10: 2018-01-01 00:51:59    2278968    297     1      43.53 2018-01-01
11: 2018-01-20 00:51:59    2278968    297     1      43.53 2018-01-20 HERE
...

and the second one has that structure:

   day       
   <date>    
 1 NA        
 2 2018-07-13
 3 2018-07-01
 4 2018-07-02
 5 2018-07-05
 6 2018-07-11
 7 2018-07-04
 8 2018-07-08
 9 2018-07-09
10 2018-06-30
11 2018-07-06
12 2018-07-10
13 2018-07-03
14 2018-07-12
15 2018-07-07
16 2018-01-20 HERE

Output I want to achieve:

                   time transactID itemID order salesPrice        day prom
 1: 2018-01-01 00:01:56    2278968    450     1      17.42 2018-01-01 ""
 2: 2018-01-01 00:01:56    2278968     83     1       5.19 2018-01-01 ""
 3: 2018-01-01 00:07:11    2255797   7851     2      20.47 2018-01-01 ""
 4: 2018-01-01 00:09:24    2278968    450     1      17.42 2018-01-01 ""
 5: 2018-01-01 00:09:24    2278968     83     1       5.19 2018-01-01 ""
 6: 2018-01-01 00:39:26    2257125   9375     1      31.02 2018-01-01 ""
 7: 2018-01-01 00:51:59    2278968    450     1      17.42 2018-01-01 ""
 8: 2018-01-01 00:51:59    2278968     83     1       5.19 2018-01-01 ""
 9: 2018-01-01 00:51:59    2278968     19     1      77.64 2018-01-01 ""
10: 2018-01-01 00:51:59    2278968    297     1      43.53 2018-01-01 ""
11: 2018-01-20 00:51:59    2278968    297     1      43.53 2018-01-20 "*"
...

My goal is to create a new column, called prom, that marks the dates that match. So for example, I have a match between the first and second column day 2018-01-20 (see Example), then it should mark in the new column that day for example with a star (*). The rows that dont match should be filled with nothing "".

What I tried was something like this (didnt worked, just for the idea what I want to accomplish):

df1$prom <- ifelse(df1$day %in% df2$day, "*","")

Code for reproduction (no matching dates included):

First df

structure(list(time = structure(c(1514764916, 1514764916, 1514765231, 
                                         1514765364, 1514765364, 1514767166, 1514767919, 1514767919, 1514767919, 
                                         1514767919, 1514767919, 1514767919, 1514768104, 1514768214, 1514768214, 
                                         1514768214, 1514768214, 1514768214, 1514768214, 1514770106, 1516406400
), tzone = "UTC", class = c("POSIXct", "POSIXt")), transactID = c(2278968, 
                                                                  2278968, 2255797, 2278968, 2278968, 2257125, 2278968, 2278968, 
                                                                  2278968, 2278968, 2278968, 2278968, 2255111, 2278968, 2278968, 
                                                                  2278968, 2278968, 2278968, 2278968, 2255111, 2255111), itemID = c(450, 
                                                                                                                                    83, 7851, 450, 83, 9375, 450, 83, 19, 297, 295, 109, 2049, 19, 
                                                                                                                                    83, 295, 297, 450, 109, 4322, 4322), order = c(1, 1, 2, 1, 1, 
                                                                                                                                                                                   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), salesPrice = c(17.42, 
                                                                                                                                                                                                                                                   5.19, 20.47, 17.42, 5.19, 31.02, 17.42, 5.19, 77.64, 43.53, 37.79, 
                                                                                                                                                                                                                                                   5.8, 35.75, 77.64, 5.19, 37.79, 43.53, 17.42, 5.8, 22.86, 22.86
                                                                                                                                                                                   ), day = structure(c(17532, 17532, 17532, 17532, 17532, 17532, 
                                                                                                                                                                                                        17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 
                                                                                                                                                                                                        17532, 17532, 17532, 17532, 17532, 17551), class = "Date")), row.names = c(NA, 
                                                                                                                                                                                                                                                                                   -21L), class = c("data.table", "data.frame"))

Second df

structure(list(day = structure(c(NA, 1531440000, 1530403200, 
1530489600, 1530748800, 1531267200, 1530662400, 1531008000, 1531094400, 
1530316800, 1530835200, 1531180800, 1530576000, 1531353600, 1530921600, 
1516402800), tzone = "", class = c("POSIXct", "POSIXt"))), row.names = c(NA, 
-16L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

Not sure if this is what you're looking for. Could you show us how is the result you expect?

df2$prom <- "*"

library(tidyverse)
left_join(df1, df2) %>%
  mutate(prom = replace_na(prom, ""))
  •  Tags:  
  • r
  • Related