I have around 8-10 dataframes. Each with two columns i.e., Date and Value
Data Frame 1 - df1
Date Value_df1
01.01.2020 1200
02.01.2020 1300
03.01.2020 1240
06.01.2020 3900
10.02.2020 1500
.
.
.
31.12.2020 2000
Second dataframe - df2 (Starts from a different date and ends in the last month of 2020)
Date Value_df2
03.01.2020 120
04.01.2020 130
06.01.2020 140
06.01.2020 150
08.01.2020 1657
.
.
.
30.12.2020 6000
I used this code to fill the missing dates for dataframe 1 - to make df1 a continuous data and set all the nas to 0 for the value column
df1 %>%
tidyr::complete(date = seq.Date(df1$Date[1],df1$Date[nrow(df1)], by="day"))
I want to add a the second column of all other data frames to this first data frame accordingly with their dates. The final data frame should look like this.
Date Value_df1 value_df2 value_df3 value_df4 .....value_df8
01.01.2020 1200 0
02.01.2020 1300 0
03.01.2020 1240 120
04.01.2020 0 130
05.01.2020 0 140
06.01.2020 3900 150
07.01.2020 0 0
08.01.2020 0 1657
09.01.2020 0 0
10.02.2020 1500 0
.
.
.
30.12.2020 0 6000
31.12.2020 2000 0
I hope my question is clear. Can anyone help me out with this. how can I add the column with respect to the dates of the first data frame.
CodePudding user response:
You could use the sqldf() package and write it as a SQL statement:
library(sqldf)
df1 <- some_data
df2 <- your_other_data
# Here's the part with sqldf
query <- ' select date, value_df1, NA as value_df2 from df1
union all
select date, NA, value_df2 from df2'
final_data <- sqldf(query)
After this code runs, it's up to you if you want to set NAs to zero, etc.
CodePudding user response:
First you want to do put your tables into a list to do full joins on all with Reduce
. Then you want to replace the NA
's with 0
's (here using coalesce
with mutate
-across
).
library(dplyr)
#library(lubridate)
list(df1, df2, df3) |>
Reduce(full_join, x = _) |>
mutate(across(starts_with("Value"), coalesce, 0)) |>
arrange(lubridate::dmy(Date))
If Date
is already a date
-type, you can use arrange(Date)
and avoid lubridate
.
Output:
# A tibble: 13 × 4
Date Value_df1 Value_df2 Value_df3
<chr> <dbl> <dbl> <dbl>
1 01-01-2020 1200 0 0
2 02-01-2020 1300 0 0
3 03-01-2020 1240 120 120
4 04-01-2020 0 130 0
5 06-01-2020 3900 140 0
6 06-01-2020 3900 150 0
7 08-01-2020 0 1657 0
8 04-02-2020 0 0 130
9 06-02-2020 0 0 140
10 06-02-2020 0 0 150
11 08-03-2020 0 0 1657
12 30-12-2020 0 6000 6000
13 31-12-2020 2000 0 0
Data:
library(readr)
df1 <- read_table("Date Value_df1
01-01-2020 1200
02-01-2020 1300
03-01-2020 1240
06-01-2020 3900
31-12-2020 2000")
df2 <- read_table("Date Value_df2
03-01-2020 120
04-01-2020 130
06-01-2020 140
06-01-2020 150
08-01-2020 1657
30-12-2020 6000")
df3 <- read_table("Date Value_df3
03-01-2020 120
04-02-2020 130
06-02-2020 140
06-02-2020 150
08-03-2020 1657
30-12-2020 6000")