Home > database >  Remove rows that do not match common dates from a separate data frame
Remove rows that do not match common dates from a separate data frame

Time:06-09

I have two data frames, each containing a column with dates. I would like to modify the first data frame such that its rows (dates) match the second data frame according to common dates. The first data frame has more dates (includes weekend data) than the second (does not include weekend data), so it would require filtering out those dates in the first data frame to match only those dates in the second.

Example of first data frame (df1):

Date Value
2014-09-19 1
2014-09-20 3
2014-09-21 3
2014-09-22 2
2014-09-23 1

Example of second data frame (df2):

Date Value
2014-09-19 1
2014-09-22 3
2014-09-23 2

Example of desired output of df1, removing the dates that are not common in df2:

Date Value
2014-09-19 1
2014-09-22 2
2014-09-23 1

CodePudding user response:

df1 <- read.table(header = T, text = "Date  Value
2014-09-19  1
2014-09-20  3
2014-09-21  3
2014-09-22  2
2014-09-23  1")


df2 <- read.table(header = T, text = "Date  Value
2014-09-19  1
2014-09-22  3
2014-09-23  2")

library(dplyr)
df1 %>% filter(Date %in% df2$Date)

output :

        Date Value
1 2014-09-19     1
2 2014-09-22     2
3 2014-09-23     1

CodePudding user response:

Assuming that df1 and df2 each have unique dates and that what is wanted is rows of df1 for which there is a matching date in df2 use merge like this:

merge(df1, df2[1])
##         Date Value
## 1 2014-09-19     1
## 2 2014-09-22     2
## 3 2014-09-23     1

These also work:

library(dplyr)
inner_join(df1, df2[1])
## Joining, by = "Date"
##         Date Value
## 1 2014-09-19     1
## 2 2014-09-22     2
## 3 2014-09-23     1

library(sqldf)
sqldf("select a.* from df1 a inner join df2 b using(Date)")
##         Date Value
## 1 2014-09-19     1
## 2 2014-09-22     2
## 3 2014-09-23     1

library(data.table)
as.data.table(df1)[df2[[1]],, on = "Date"]
##          Date Value
## 1: 2014-09-19     1
## 2: 2014-09-22     2
## 3: 2014-09-23     1

# convert to zoo and output a zoo series
library(zoo)
z1 <- read.zoo(df1)
z2 <- read.zoo(df2)
z1[time(z2)]
## 2014-09-19 2014-09-22 2014-09-23 
##          1          2          1 

Note

The input in reproducible form is:

df1 <- structure(list(Date = c("2014-09-19", "2014-09-20", "2014-09-21",
"2014-09-22", "2014-09-23"), Value = c(1L, 3L, 3L, 2L, 1L)), 
 class = "data.frame", row.names = c(NA, -5L))

df2 <- structure(list(Date = c("2014-09-19", "2014-09-22", "2014-09-23"
), Value = c(1L, 3L, 2L)), 
class = "data.frame", row.names = c(NA, -3L))

CodePudding user response:

We can use a dplyr::semi_join:

library(dplyr)

df1 %>% 
  semi_join(df2, by  = "date")
#> # A tibble: 3 x 2
#>   date       value
#>   <date>     <dbl>
#> 1 2014-09-19     1
#> 2 2014-09-22     2
#> 3 2014-09-23     1


# the data
df1 <- tribble(~ date, ~ value,
  "2014-09-19", 1,
  "2014-09-20",  3,
  "2014-09-21",  3,
  "2014-09-22",  2,
  "2014-09-23",  1
) %>% 
  mutate(date = as.Date(date))

df2 <- tribble(~ date, ~ value,
               "2014-09-19", 1,
               "2014-09-22",  3,
               "2014-09-23",  2
) %>% 
  mutate(date = as.Date(date))

Created on 2022-06-08 by the reprex package (v2.0.1)

CodePudding user response:

With the base subset option:

df1[df1$Date %in% df2$Date,]

Output:

        Date Value
1 2014-09-19     1
4 2014-09-22     2
5 2014-09-23     1

CodePudding user response:

If not using more specific libraries, can be done as following:

df1 <- df1[, df1$Date %in% df2$Date]

  • Related