Home > OS >  Add column wrt to the date
Add column wrt to the date

Time:09-07

I have around 8-10 dataframes. Each with two columns i.e., Date and Value

Data Frame 1 - df1

 Date       Value_df1
01.01.2020   1200
02.01.2020   1300
03.01.2020   1240
06.01.2020   3900
10.02.2020   1500 
.
.
.
31.12.2020   2000

Second dataframe - df2 (Starts from a different date and ends in the last month of 2020)

  Date       Value_df2
03.01.2020   120
04.01.2020   130
06.01.2020   140
06.01.2020   150
08.01.2020   1657 
.
.
.      
30.12.2020   6000

I used this code to fill the missing dates for dataframe 1 - to make df1 a continuous data and set all the nas to 0 for the value column

df1 %>%
  tidyr::complete(date = seq.Date(df1$Date[1],df1$Date[nrow(df1)], by="day"))

I want to add a the second column of all other data frames to this first data frame accordingly with their dates. The final data frame should look like this.

  Date       Value_df1    value_df2   value_df3   value_df4 .....value_df8
    01.01.2020   1200       0
    02.01.2020   1300       0
    03.01.2020   1240      120
    04.01.2020    0        130
    05.01.2020    0        140
    06.01.2020   3900      150
    07.01.2020    0          0
    08.01.2020    0        1657
    09.01.2020    0        0
    10.02.2020   1500      0
    .
    .
    .
    30.12.2020     0      6000
    31.12.2020   2000      0

I hope my question is clear. Can anyone help me out with this. how can I add the column with respect to the dates of the first data frame.

CodePudding user response:

You could use the sqldf() package and write it as a SQL statement:

library(sqldf)

df1 <- some_data
df2 <- your_other_data

# Here's the part with sqldf

query <- ' select date, value_df1, NA as value_df2 from df1
           union all
           select date, NA, value_df2 from df2'

final_data <- sqldf(query)

After this code runs, it's up to you if you want to set NAs to zero, etc.

CodePudding user response:

First you want to do put your tables into a list to do full joins on all with Reduce. Then you want to replace the NA's with 0's (here using coalesce with mutate-across).

library(dplyr)
#library(lubridate)

list(df1, df2, df3) |>
  Reduce(full_join, x = _) |>
  mutate(across(starts_with("Value"), coalesce, 0)) |>
  arrange(lubridate::dmy(Date))

If Date is already a date-type, you can use arrange(Date) and avoid lubridate.

Output:

# A tibble: 13 × 4
   Date       Value_df1 Value_df2 Value_df3
   <chr>          <dbl>     <dbl>     <dbl>
 1 01-01-2020      1200         0         0
 2 02-01-2020      1300         0         0
 3 03-01-2020      1240       120       120
 4 04-01-2020         0       130         0
 5 06-01-2020      3900       140         0
 6 06-01-2020      3900       150         0
 7 08-01-2020         0      1657         0
 8 04-02-2020         0         0       130
 9 06-02-2020         0         0       140
10 06-02-2020         0         0       150
11 08-03-2020         0         0      1657
12 30-12-2020         0      6000      6000
13 31-12-2020      2000         0         0

Data:

library(readr)

df1 <- read_table("Date Value_df1
           01-01-2020 1200
           02-01-2020 1300
           03-01-2020 1240
           06-01-2020 3900
           31-12-2020 2000")

df2 <- read_table("Date Value_df2
            03-01-2020 120
            04-01-2020 130
            06-01-2020 140
            06-01-2020 150
            08-01-2020 1657
            30-12-2020 6000")

df3 <- read_table("Date Value_df3
            03-01-2020 120
            04-02-2020 130
            06-02-2020 140
            06-02-2020 150
            08-03-2020 1657
            30-12-2020 6000")
  • Related