I have created two data frames that I then turn into lists (e.g., list1
and list2
). I removed one element from list2
to better represent my example data set.
library(dplyr)
intervals <- rep_len(c("01-01-2022", "01-11-2022", "01-31-2022"), 100)
ID <- rep(c("A","B", "C"), 100)
df <- data.frame(ID = as.factor(ID),
intervals = as.factor(intervals))
list1 <- df %>%
group_by(ID, intervals) %>%
group_split()
intervals <- rep_len(c("01-01-2022", "01-11-2022", "01-31-2022"), 25)
ID <- rep(c("A","B"), 25)
df2 <- data.frame(ID = as.factor(ID),
intervals = as.factor(intervals))
list2 <- df2 %>%
group_by(ID, intervals) %>%
group_split()
list2 <- list2[-6]
For each of these list I have added an attribute, and I have included a function to check the added attribute more readily (check
).
# Convenience function to grab the attributes for you
check <- function(list, attribute_name) {
return(attr(list, attribute_name))
}
# Add an attribute to hold the attributes of each list element
attr(list1, "match") <- data.frame(id = sapply(list1, function(x) paste(x$ID[1])),
interval_start_date = sapply(list1, function(x) paste(x$intervals[1]))
)
# Check the attributes
check(list1, "match")
# Add an attribute "tab" to hold the attributes of each list element
attr(list2, "match") <- data.frame(id = sapply(list2, function(x) paste(x$ID[1])),
interval_start_date = sapply(list2, function(x) paste(x$intervals[1]))
)
# Check the attributes
check(list2, "match")
I have created an index for the two list, and the objective here is to remove any list components that don't have the same ID
and the same intervals
. The goal is to have only the matching ID
s with the same intervals
.
# Creates an index for the two list based on the attributes,
dat2 <- check(list1, "match")
dat1 <- check(list2, "match")
# Removes rows where the id isn't present in both data frames, and creates a
# index where both the interval and id are the same.
if (!length(unique(dat2$id)) == length(unique(dat1$id))){
dat3 <- dat2[dat2$id %in% dat1$id, ]
dat4 <- dat1[dat1$id %in% dat2$id, ]
i1 <- paste(dat3[["id"]], format(as.Date(dat3[["interval_"]]),
"%Y-%d")) %in%
paste(dat4[["id"]], format(as.Date(dat4[["interval_"]]),
"%Y-%d"))
}
Now here is where I begin to get an error:
# Error occurs because the lengths of `i1` is not the same as `list2`
out <- list1[i1]
I know that this is occuring because list1
does not have the same length as i1
. I'm wondering if there is a way to appending logical values to i1
to get it the same length as list1
, but in a way that it doesn't remove values from list1
that we actually do want to keep. Any thoughts?
Here is my expected output for list1
, where I hope it ends up with only the same ID
s and intervals
as list2
.
# Expected output
expected_list1 <- list(list1[1], list1[2],list1[3], list1[4], list1[5])
This answer is close to what I would like, but it has an additional element. I think ultimately the attribute table should be similiar to that of dat4
.
test <- list1[dat2$id %in% dat1$id][i1]
# Add an attribute "tab" to hold the attributes of each list element
attr(test, "match") <- data.frame(id = sapply(test, function(x) paste(x$ID[1])),
interval_start_date = sapply(test, function(x) paste(x$intervals[1]))
)
# Check the attributes
check(test, "match")
CodePudding user response:
There was a mismatch in the column name i.e. it is not interval_
, but interval_start_date
in dat1
and dat2
. [[
will look for exact match whereas $
can match partial names as well
if (!length(unique(dat2$id)) == length(unique(dat1$id))){
ids_common <- intersect(dat2$id, dat1$id)
inds1 <- dat2$id %in% ids_common
inds2 <- dat1$id %in% ids_common
i1 <- paste(dat2[["id"]], format(as.Date(dat2[["interval_start_date"]]),
"%Y-%d")) %in%
paste(dat1[["id"]], format(as.Date(dat1[["interval_start_date"]]),
"%Y-%d"))
out <- list1[i1 & inds1]
}
-checking
> length(out)
[1] 5
> i1
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE