I have following two dataframes,
ID | first_date | last_date |
---|---|---|
1 | 2022-09-01 | 2022-09-03 |
2 | 2022-09-25 | 2022-09-26 |
ID | dates |
---|---|
1 | 2022-09-01 |
1 | 2022-09-02 |
2 | 2022-09-25 |
2 | 2022-09-26 |
If an ID of a first dataframe doesn't have date values within a range from first_date to last_date in second dataframe, I have to add 1 to the column named missing
or 0 otherwise as below.
ID | first_date | last_date | missing |
---|---|---|---|
1 | 2022-09-01 | 2022-09-03 | 1 |
2 | 2022-09-25 | 2022-09-26 | 0 |
How can I do this in R? Can anyone help please!
CodePudding user response:
A dplyr
/tidyr
way: First calculate the amount of days between the two days in df1
and compare with the amount of days present in df2
.
library(dplyr)
library(tidyr)
df1 |>
group_by(ID) |>
mutate(days = length(seq(from = first_date, to = last_date, by = 1))) |>
ungroup() |>
left_join(count(df2, ID)) |>
mutate(missing = !(days == n)) |>
select(-days, -n) # You might be interested in keeping these!
Output:
# A tibble: 2 × 4
ID first_date last_date missing
<dbl> <date> <date> <int>
1 1 2022-09-01 2022-09-03 1
2 2 2022-09-25 2022-09-26 0
Data:
library(readr)
df1 <- read_table("ID first_date last_date
1 2022-09-01 2022-09-03
2 2022-09-25 2022-09-26")
df2 <- read_table("ID dates
1 2022-09-01
1 2022-09-02
2 2022-09-25
2 2022-09-26")
CodePudding user response:
Here I am using outer
and &
to check if any date from df2 is contained in any of the ranges specified in df1.
mutate(df1, missing = (outer(first_date, df2$dates, `<`) &
outer(last_date, df2$dates, `>`)) |>
apply(1, any) |>
as.integer())
##> ID first_date last_date missing
##> 1 1 2022-09-01 2022-09-03 1
##> 2 2 2022-09-25 2022-09-26 0
For larger problems, it would be useful to know if the ranges are disjointed or have overlaps.
You may also want to look into ?iranges::findOverlap