I want to create a presence/absence matrix that has the date on the y-axes and individuals on the x-axes. When an individual was present on the particular date, the output should be 1, while if it was absent the output should be 0. I have a dataframe with the names of individuals and the dates that they are present in a group:
ID | Start | End |
---|---|---|
Afr | 2015-06-29 | 2016-02-16 |
Ahe | 2016-12-18 | 2018-02-24 |
Art | 2015-07-01 | 2020-04-30 |
... |
In total I have over a thousand individuals and their dates.
I also have a list/dataframe that contains all the dates from 2015-01-01 to 2021-31-12.
My output data needs to look like this:
Date | Afr | Ahe | Art | ... |
---|---|---|---|---|
2015-07-01 | 1 | 0 | 0 | ... |
2015-07-02 | 1 | 0 | 1 | ... |
2015-07-03 | 1 | 0 | 1 | ... |
... |
Where the output is 1 when an individual was present in the group at that time and a 0 when it was not.
I feel like there should be an easy solution for creating this but so far I have not managed. One of the problems I am encountering is that the list of dates is for example longer than the dataframe with the individuals, making a dcast
function for example impossible.
Any help or suggestions would be greatly appreciated! Please also let me know if I should provide more code/background.
Thank you very much!
CodePudding user response:
We can try the code below
library(data.table)
setDT(df1)
setDT(df2)
na.omit(
dcast(
df1[df2, .(Date, ID), on = .(Start < Date, End > Date)][df1, on = .(ID)],
Date ~ ID,
fun.aggregate = length
)
)
which gives
Date Afr Ahe Art
1: 2015-07-01 1 0 0
2: 2015-07-02 1 0 1
3: 2015-07-03 1 0 1
Data
> dput(df1)
structure(list(ID = c("Afr", "Ahe", "Art"), Start = structure(c(16615,
17153, 16617), class = "Date"), End = structure(c(16847, 17586,
18382), class = "Date")), class = "data.frame", row.names = c(NA,
-3L))
> dput(df2)
structure(list(Date = structure(c(16617, 16618, 16619), class = "Date")), class = "data.frame", row.names = c(NA,
-3L))