I have the following list of dataset :
dflist <- list(df1_A, df1_B, df1_C, df1_D, df1_E,
df2_A, df2_B, df2_C, df2_D, df2_E,
df3_A, df3_B, df3_C, df3_D, df3_E,
df4_A, df4_B, df4_C, df4_D, df4_E)
names(dflist) <- c("df1_A", "df1_B", "df1_C", "df1_D", "df1_E",
"df2_A", "df2_B", "df2_C", "df2_D", "df2_E",
"df3_A", "df3_B", "df3_C", "df3_D", "df3_E",
"df4_A", "df4_B", "df4_C", "df4_D", "df4_E")
Each dataframe have the same structure (with the same column's names):
df1_A
V1 V2
G18941 17
G20092 534
G19692 10
G19703 260
G16777 231
G20045 0
...
I would like to make a function that merges all the dataframes with the same number (but different letter) in my list and sums the values in column V2 when the names in V1 are the same.
In hard, I managed to do this for df1_A and df1_B with the following code:
newdf <- bind_rows(df1_A, df1_B) %>%
group_by(V1) %>%
summarise_all(., sum, na.rm = TRUE)
I can easily turn this into a function like this:
MergeAndSum <- function(df1,df2)
newdf <- bind_rows(df1, df2) %>%
group_by(V1) %>%
summarise_all(., sum, na.rm = TRUE)
return(newdf)
But I don't really see how to call it to do the loop. I try something like :
for (i in 2:length(dflist)){
df1 <- List_RawCounts_Files[i-1]
df2 <- List_RawCounts_Files[i]
out1 <- MergeAndSum(df1,df2)
return(out1)
}
I imagine something that merges sum the df1_A to the df1_B and reassigns the result to df1_A, then calls back the function with df1_A and df1_C and reassigns the result to df1_A, then calls back the function with df1_A and df1_D, and reassigns the result to df1_A, then calls back the function with df1_A and df1_E
Then the same thing with df2 (df2_A, df2_B,... df2_E), then df3, df4 and df5.
If you know how to do this I am listening. Thanks
CodePudding user response:
bind_rows
can combine list of dataframes together. You can combine them with the id
column so that the name of the list is added as a new column, extract the dataframe name (df1
from df1_A
, df2
from df2_A
and so on) and take the sum of V2
column for each dataframe and V1
column as group.
library(dplyr)
bind_rows(dflist, .id = "id") %>%
mutate(id = stringr::str_extract(id, 'df\\d ')) %>%
group_by(id, V1) %>%
summarise(V2 = sum(V2, na.rm = TRUE), .groups = "drop")
Since you want to sum
only one column (V2
) you can use summarise
instead of summarise_all
which has been superseded.