Home > Software design >  Removing components based on the different names between two lists
Removing components based on the different names between two lists

Time:10-28

I have two list of data frames, but one has an extra ID in it. I would like to remove the extra ID using the names that each component of the list is assigned. My actual data set has a whole slew of IDs so I would like to essentially create a function that will allow me to remove any ID with a different name using the names in another list as the index.

How could I go about doing this? In this case, I would be removing the ID D from the list and not because of the data frames in D, but because the names in july2 differ from the names in july.

I have tried using setdiff but it just ends up returning a the list that I place in the first argument.

> setdiff(july, july2)
<list_of<
  tbl_df<
    date : date
    x    : double
    y    : double
    ID   : character
    jDate: double
    Month: double
    new  : date
  >
>[12]>
$A
# A tibble: 16 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-04 60161. 800440. A     14794     7 2010-07-01
 2 2010-07-08 61139. 825947. A     14798     7 2010-07-01
 3 2010-07-04 60161. 800440. A     14794     7 2010-07-01
 4 2010-07-08 61139. 825947. A     14798     7 2010-07-01
 5 2010-07-04 60161. 800440. A     14794     7 2010-07-01
 6 2010-07-08 61139. 825947. A     14798     7 2010-07-01
 7 2010-07-04 60161. 800440. A     14794     7 2010-07-01
 8 2010-07-08 61139. 825947. A     14798     7 2010-07-01
 9 2010-07-04 60161. 800440. A     14794     7 2010-07-01
10 2010-07-08 61139. 825947. A     14798     7 2010-07-01
11 2010-07-04 60161. 800440. A     14794     7 2010-07-01
12 2010-07-08 61139. 825947. A     14798     7 2010-07-01
13 2010-07-04 60161. 800440. A     14794     7 2010-07-01
14 2010-07-08 61139. 825947. A     14798     7 2010-07-01
15 2010-07-04 60161. 800440. A     14794     7 2010-07-01
16 2010-07-08 61139. 825947. A     14798     7 2010-07-01

$A
# A tibble: 24 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-12 66502. 804956. A     14802     7 2010-07-11
 2 2010-07-16 79728. 858097. A     14806     7 2010-07-11
 3 2010-07-20 77342. 830852. A     14810     7 2010-07-11
 4 2010-07-12 66502. 804956. A     14802     7 2010-07-11
 5 2010-07-16 79728. 858097. A     14806     7 2010-07-11
 6 2010-07-20 77342. 830852. A     14810     7 2010-07-11
 7 2010-07-12 66502. 804956. A     14802     7 2010-07-11
 8 2010-07-16 79728. 858097. A     14806     7 2010-07-11
 9 2010-07-20 77342. 830852. A     14810     7 2010-07-11
10 2010-07-12 66502. 804956. A     14802     7 2010-07-11
# ... with 14 more rows

$A
# A tibble: 16 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-24 75483. 828763. A     14814     7 2010-07-21
 2 2010-07-28 69508. 806470. A     14818     7 2010-07-21
 3 2010-07-24 75483. 828763. A     14814     7 2010-07-21
 4 2010-07-28 69508. 806470. A     14818     7 2010-07-21
 5 2010-07-24 75483. 828763. A     14814     7 2010-07-21
 6 2010-07-28 69508. 806470. A     14818     7 2010-07-21
 7 2010-07-24 75483. 828763. A     14814     7 2010-07-21
 8 2010-07-28 69508. 806470. A     14818     7 2010-07-21
 9 2010-07-24 75483. 828763. A     14814     7 2010-07-21
10 2010-07-28 69508. 806470. A     14818     7 2010-07-21
11 2010-07-24 75483. 828763. A     14814     7 2010-07-21
12 2010-07-28 69508. 806470. A     14818     7 2010-07-21
13 2010-07-24 75483. 828763. A     14814     7 2010-07-21
14 2010-07-28 69508. 806470. A     14818     7 2010-07-21
15 2010-07-24 75483. 828763. A     14814     7 2010-07-21
16 2010-07-28 69508. 806470. A     14818     7 2010-07-21

$B
# A tibble: 24 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-01 72826. 888060. B     14791     7 2010-07-01
 2 2010-07-05 67469. 807307. B     14795     7 2010-07-01
 3 2010-07-09 77834. 868002. B     14799     7 2010-07-01
 4 2010-07-01 72826. 888060. B     14791     7 2010-07-01
 5 2010-07-05 67469. 807307. B     14795     7 2010-07-01
 6 2010-07-09 77834. 868002. B     14799     7 2010-07-01
 7 2010-07-01 72826. 888060. B     14791     7 2010-07-01
 8 2010-07-05 67469. 807307. B     14795     7 2010-07-01
 9 2010-07-09 77834. 868002. B     14799     7 2010-07-01
10 2010-07-01 72826. 888060. B     14791     7 2010-07-01
# ... with 14 more rows

$B
# A tibble: 16 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-13 74643. 845222. B     14803     7 2010-07-11
 2 2010-07-17 78530. 807316. B     14807     7 2010-07-11
 3 2010-07-13 74643. 845222. B     14803     7 2010-07-11
 4 2010-07-17 78530. 807316. B     14807     7 2010-07-11
 5 2010-07-13 74643. 845222. B     14803     7 2010-07-11
 6 2010-07-17 78530. 807316. B     14807     7 2010-07-11
 7 2010-07-13 74643. 845222. B     14803     7 2010-07-11
 8 2010-07-17 78530. 807316. B     14807     7 2010-07-11
 9 2010-07-13 74643. 845222. B     14803     7 2010-07-11
10 2010-07-17 78530. 807316. B     14807     7 2010-07-11
11 2010-07-13 74643. 845222. B     14803     7 2010-07-11
12 2010-07-17 78530. 807316. B     14807     7 2010-07-11
13 2010-07-13 74643. 845222. B     14803     7 2010-07-11
14 2010-07-17 78530. 807316. B     14807     7 2010-07-11
15 2010-07-13 74643. 845222. B     14803     7 2010-07-11
16 2010-07-17 78530. 807316. B     14807     7 2010-07-11

$B
# A tibble: 24 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-21 61332. 840310. B     14811     7 2010-07-21
 2 2010-07-25 69102. 809024. B     14815     7 2010-07-21
 3 2010-07-29 66088. 817887. B     14819     7 2010-07-21
 4 2010-07-21 61332. 840310. B     14811     7 2010-07-21
 5 2010-07-25 69102. 809024. B     14815     7 2010-07-21
 6 2010-07-29 66088. 817887. B     14819     7 2010-07-21
 7 2010-07-21 61332. 840310. B     14811     7 2010-07-21
 8 2010-07-25 69102. 809024. B     14815     7 2010-07-21
 9 2010-07-29 66088. 817887. B     14819     7 2010-07-21
10 2010-07-21 61332. 840310. B     14811     7 2010-07-21
# ... with 14 more rows

$C
# A tibble: 24 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-02 71110. 898586. C     14792     7 2010-07-01
 2 2010-07-06 78769. 821287. C     14796     7 2010-07-01
 3 2010-07-10 62446. 874366. C     14800     7 2010-07-01
 4 2010-07-02 71110. 898586. C     14792     7 2010-07-01
 5 2010-07-06 78769. 821287. C     14796     7 2010-07-01
 6 2010-07-10 62446. 874366. C     14800     7 2010-07-01
 7 2010-07-02 71110. 898586. C     14792     7 2010-07-01
 8 2010-07-06 78769. 821287. C     14796     7 2010-07-01
 9 2010-07-10 62446. 874366. C     14800     7 2010-07-01
10 2010-07-02 71110. 898586. C     14792     7 2010-07-01
# ... with 14 more rows

$C
# A tibble: 16 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-14 77316. 882468. C     14804     7 2010-07-11
 2 2010-07-18 65028. 815016. C     14808     7 2010-07-11
 3 2010-07-14 77316. 882468. C     14804     7 2010-07-11
 4 2010-07-18 65028. 815016. C     14808     7 2010-07-11
 5 2010-07-14 77316. 882468. C     14804     7 2010-07-11
 6 2010-07-18 65028. 815016. C     14808     7 2010-07-11
 7 2010-07-14 77316. 882468. C     14804     7 2010-07-11
 8 2010-07-18 65028. 815016. C     14808     7 2010-07-11
 9 2010-07-14 77316. 882468. C     14804     7 2010-07-11
10 2010-07-18 65028. 815016. C     14808     7 2010-07-11
11 2010-07-14 77316. 882468. C     14804     7 2010-07-11
12 2010-07-18 65028. 815016. C     14808     7 2010-07-11
13 2010-07-14 77316. 882468. C     14804     7 2010-07-11
14 2010-07-18 65028. 815016. C     14808     7 2010-07-11
15 2010-07-14 77316. 882468. C     14804     7 2010-07-11
16 2010-07-18 65028. 815016. C     14808     7 2010-07-11

$C
# A tibble: 24 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-22 65117. 866750. C     14812     7 2010-07-21
 2 2010-07-26 78462. 823259. C     14816     7 2010-07-21
 3 2010-07-30 69577. 848118. C     14820     7 2010-07-21
 4 2010-07-22 65117. 866750. C     14812     7 2010-07-21
 5 2010-07-26 78462. 823259. C     14816     7 2010-07-21
 6 2010-07-30 69577. 848118. C     14820     7 2010-07-21
 7 2010-07-22 65117. 866750. C     14812     7 2010-07-21
 8 2010-07-26 78462. 823259. C     14816     7 2010-07-21
 9 2010-07-30 69577. 848118. C     14820     7 2010-07-21
10 2010-07-22 65117. 866750. C     14812     7 2010-07-21
# ... with 14 more rows

$D
# A tibble: 16 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-03 77586. 819905. D     14793     7 2010-07-01
 2 2010-07-07 76249. 848582. D     14797     7 2010-07-01
 3 2010-07-03 77586. 819905. D     14793     7 2010-07-01
 4 2010-07-07 76249. 848582. D     14797     7 2010-07-01
 5 2010-07-03 77586. 819905. D     14793     7 2010-07-01
 6 2010-07-07 76249. 848582. D     14797     7 2010-07-01
 7 2010-07-03 77586. 819905. D     14793     7 2010-07-01
 8 2010-07-07 76249. 848582. D     14797     7 2010-07-01
 9 2010-07-03 77586. 819905. D     14793     7 2010-07-01
10 2010-07-07 76249. 848582. D     14797     7 2010-07-01
11 2010-07-03 77586. 819905. D     14793     7 2010-07-01
12 2010-07-07 76249. 848582. D     14797     7 2010-07-01
13 2010-07-03 77586. 819905. D     14793     7 2010-07-01
14 2010-07-07 76249. 848582. D     14797     7 2010-07-01
15 2010-07-03 77586. 819905. D     14793     7 2010-07-01
16 2010-07-07 76249. 848582. D     14797     7 2010-07-01

$D
# A tibble: 24 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-11 61531. 883305. D     14801     7 2010-07-11
 2 2010-07-15 69514. 867063. D     14805     7 2010-07-11
 3 2010-07-19 69178. 890183. D     14809     7 2010-07-11
 4 2010-07-11 61531. 883305. D     14801     7 2010-07-11
 5 2010-07-15 69514. 867063. D     14805     7 2010-07-11
 6 2010-07-19 69178. 890183. D     14809     7 2010-07-11
 7 2010-07-11 61531. 883305. D     14801     7 2010-07-11
 8 2010-07-15 69514. 867063. D     14805     7 2010-07-11
 9 2010-07-19 69178. 890183. D     14809     7 2010-07-11
10 2010-07-11 61531. 883305. D     14801     7 2010-07-11
# ... with 14 more rows

$D
# A tibble: 24 x 7
   date            x       y ID    jDate Month new       
   <date>      <dbl>   <dbl> <chr> <dbl> <dbl> <date>    
 1 2010-07-23 74554. 898077. D     14813     7 2010-07-21
 2 2010-07-27 77455. 834715. D     14817     7 2010-07-21
 3 2010-07-31 77461. 873993. D     14821     7 2010-07-21
 4 2010-07-23 74554. 898077. D     14813     7 2010-07-21
 5 2010-07-27 77455. 834715. D     14817     7 2010-07-21
 6 2010-07-31 77461. 873993. D     14821     7 2010-07-21
 7 2010-07-23 74554. 898077. D     14813     7 2010-07-21
 8 2010-07-27 77455. 834715. D     14817     7 2010-07-21
 9 2010-07-31 77461. 873993. D     14821     7 2010-07-21
10 2010-07-23 74554. 898077. D     14813     7 2010-07-21
# ... with 14 more rows

ID <-  rep(c("A","B","C", "D"), 1000)
ID2 <- rep(c("A", "B", "C"), 1000)
date <-  rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500)
x <-  runif(length(date), min = 60000, max = 80000)
y <-  runif(length(date), min = 800000, max = 900000)

df <- data.frame(date = date, 
                 x = x,
                 y =y,
                 ID)

df2 <- data.frame(date = date, 
                 x = x,
                 y =y,
                 ID2)

df2$jDate <- julian(as.Date(df2$date), origin = as.Date("1970-01-01"))
df2$Month <- month(df2$date)

july <- df %>%
  # Creates a new column assigning the first day in the 10-day interval in which
  # the date falls under (e.g., 01-03-2021 would be in the first 10-day interval
  # so the `floor_date` assigned to it would be 01-01-2021)
  mutate(new = floor_date(date, "10 days")) %>%
  # For any months that has 31 days, the 31st day would normally be assigned its 
  # own interval. The code below takes the 31st day and joins it with the 
  # previous interval. 
  group_by(ID) %>% 
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(new, .add = TRUE) %>%
  # Filter the data by the season based on the `season_categ` column
  filter(Month == "7") %>% 
  group_split()

july2 <- df2 %>%
  # Creates a new column assigning the first day in the 10-day interval in which
  # the date falls under (e.g., 01-03-2021 would be in the first 10-day interval
  # so the `floor_date` assigned to it would be 01-01-2021)
  mutate(new = floor_date(date, "10 days")) %>%
  # For any months that has 31 days, the 31st day would normally be assigned its 
  # own interval. The code below takes the 31st day and joins it with the 
  # previous interval. 
  group_by(ID2) %>% 
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(new, .add = TRUE) %>%
  # Filter the data by the season based on the `season_categ` column
  filter(Month == "7") %>% 
  group_split()

names(july) <- sapply(july, function(x) paste(x$ID[1]))
names(july2) <- sapply(july2, function(x) paste(x$ID2[1]))

CodePudding user response:

It sounds like you're trying to remove the entries in the july list that don't have names in the july2 list. If that's the case, adding the the code below to your stack will do the trick:

# Get the unique names of the `july` list
jul_names <- unique(names(july))
# Find out which names are shared between the two lists
same_names <- jul_names[jul_names%in%unique(names(july2))]
# Subset the july list to only keep those entries with specific names
july <- july[names(july)%in%same_names]

If that's not what you're hoping for, then we'll need a few more details about the problem. As a commenter pointed out, there's a couple bugs in your reprex so I made my best guess for what you were trying to do:

july <- df %>%
  mutate(new = floor_date(date, "10 days")) %>%
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(ID, new) %>%
  filter(month(date) == 7) %>% 
  group_split()
  • Related