Home > Software engineering >  Match list elements based on attribute component
Match list elements based on attribute component

Time:05-29

I have a data set that i split into two list int1 and int2.

library(lubridate)
library(tidyverse)
library(purrr)

date <- rep_len(seq(dmy("01-01-2011"), dmy("01-01-2013"), by = "days"), 300)
ID <- rep(c("A","B", "C"), 300)
df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

df$month <- month(df$date)
df$year <- year(df$date)

# Create first list
int1 <- df %>%
  mutate(new = floor_date(date, '10 day')) %>%
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(ID, new) %>%
  filter(month == "1") %>% 
  group_split()

# Create second list
int2 <- df %>%
  mutate(new = floor_date(date, '10 day')) %>%
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(ID, new) %>%
  filter(month == "2") %>% 
  group_split()


names(int1) <- sapply(int1, function(x) paste(x$ID[1],
                                                   x$year[1], sep = '_'))

names(int2) <- sapply(int2, function(x) paste(x$ID[1],
                                                    x$year[1], sep = '_'))

I then assign a attribute to each list (match). I created a function check to grab this attribute more easily. I removed some elements from one list for this exmaple.


int1 <- int1[-c(3,6)]

# Convenience function to grab the attributes for you
check <- function(x) {
  return(attr(x, "match"))
}

# Add an attribute to hold the attributes of each list element
attr(int1, "match") <- data.frame(id = sapply(int1, function(x) paste(x$ID[1])),
                                     interval_start_date = sapply(int1, function(x) paste(x$new[1]))
)

# Check the attributes
check(int1)

# Add an attribute "tab" to hold the attributes of each list element
attr(int2, "match") <- data.frame(id = sapply(int2, function(x) paste(x$ID[1])),
                                     interval_start_date = sapply(int2, function(x) paste(x$new[1]))
) 

# Check the attributes
check(int2)

I would like to remove elements that are not in another based on the attribute that I had added. Specifically I would like to remove any that don't have the same interval_start_date and ID associated with it. For the interval_start_date, only the year and the day have to match, as the month will most likely differ between the two list. In this case, I would like int2 to match int1. Any thoughts on how I could do this? A base r method is preferred, if possible.

# Expected results
expected_int2 <- list(int2[[1]], int2[[2]], int2[[3]], int2[[4]], int2[[5]], 
                      int2[[6]], int2[[7]])

names(expected_int2) <- sapply(int1, function(x) paste(x$ID[1],
                                                       x$year[1], sep = "_"))

CodePudding user response:

We may create an index with %in% after pasteing the 'id' and the formatted 'interval_start_date' i.e. after removing the 'month' part

i1 <-  with(check(int2), paste(id, format(as.Date(interval_start_date), 
     "%Y-%d"))) %in%  with(check(int1), paste(id, 
      format(as.Date(interval_start_date), "%Y-%d")))
> which(i1)
[1] 1 2 4 5 7 8 9
out <- int2[i1]
  • Related