I have two lists
A
and B
. The dates
in A
are 2000 - 2022
while those in B
are 2023-2030
.
names(A)
and names(B)
give the follow character vectors:
a <- c("ACC_a_his", "BCC_b_his", "Can_c_his", "CES_d_his")
b <- c("ACC_a_fu", "BCC_b_fu", "Can_c_fu", "CES_d_fu","FGO_c_fu")
Also, I have a string vector, c
which is common across the names in a
and b
:
c=c("ACC","BCC", "Can", "CES", "FGO")
Note that the strings in c
do not always appear in the same position in filenames. The string can be at the beginning, middle or end of filenames.
Challenge
- Using the strings in
c
I would like to get the difference (i.e., which name exists inb
but not ina
or vice versa) between the names ina
andb
Expected output
= "FGO_c_fu"
rbind
(or whatever is best) matchingdataframes
in listsA
andB
if the names are similar based on string inc
CodePudding user response:
Update: See OP's comment:
Try this:
library(dplyr)
library(tibble)
library(tidyr)
library(stringr)
# or just library(tidyverse)
df %>%
pivot_longer(everything()) %>%
mutate(x = str_extract(value, paste(c, collapse = "|"))
) %>%
group_by(x) %>%
filter(!any(row_number() > 1)) %>%
na.omit() %>%
pull(value)
[1] "FGO_c_fu"
First answer: Here is an alternative approach:
- We create a list
- the vectors are of unequal length
- With
data.frame(lapply(my_list,
length<-, max(lengths(my_list)))) we create a data frame
- pivot longer and group by all before the first underline
- remove NA and filter:
library(dplyr)
library(tidyr)
library(tibble)
my_list <- tibble::lst(a, b)
df <- data.frame(lapply(my_list, `length<-`, max(lengths(my_list))))
df %>%
pivot_longer(everything()) %>%
group_by(x = sub("\\_.*", "", value)) %>%
filter(!any(row_number() > 1)) %>%
na.omit() %>%
pull(value)
[1] "FGO_c_fu"