I have imported multiple excel files at once (more then 20) in R and I want to give values to columns in the dataframe based on the file name.
Let us say that the filename are the following: "A1_412", "A1_413" , "A1_415" , "A2_412", "A2_413", "A2_415" , "A23_413" "A4_5", "A6_413_TS"... etc
I want to create two column where the value in column_1 will show A1, A2, A23 if these row are from the file A1_412, A1_413 or A1_415
the value in column_2 will show 412 or 413 or 415 based on the name of the file
I have already imported all the files
#import all the excel files
file.list <- list.files(path = "xxxxxxxxxx",
pattern = "*.xlsx",
full.names = TRUE)
alldata<-file.list %>%
map_dfr(~read_excel(.x)%>%
mutate_all(as.character))
this is an example of the end result that im searching for:
G B column_1 column_2
a Lion A1 415
b Cat A1 412
c Horse A23 413
d dog A2 415
e Mouse A23 413
f Snake A2 412
g zebra A6 413
h bird
bird is coming from the file A4_5, I want the column_1 to be empty instead of showing A4 and column_2 to be empty instead of showing 5 Zebra is coming from file A6_413_TS so here I want column_1 to show A6 and column_2 to only show 413 zebra is coming
I need a solution that can be generalized to a large number of variables.
Any suggestions on how to do this?
CodePudding user response:
file.list %>%
set_names(.) %>%
map_df(~mutate_all(read_excel(.x), as.character), .id = 'grp') %>%
mutate(grp = str_remove(basename(grp), ".xlsx")) %>%
separate(grp, c('col1', 'col2'), sep = '_', extra = 'merge')
CodePudding user response:
You could try something like this. Basically, I use lapply()
to loop over the filenames, each time using basename()
to get the base file name, and then use str_remove()
to remove the extension, and strsplit
to split on the _
between the prefix and the suffix
bind_rows(
lapply(file.list, function(f) {
bn = basename(f)
colvals = strsplit(str_remove(bn, ".xlsx"), "_")[[1]]
read_excel(f) %>% mutate_all(as.character()) %>% mutate(col1 = colvals[1], col2=colvals[2])
})
)