In a dataframe, I have a particular set of columns (list A) that I want to subtract from another particular set of columns (list B), and then save that output (list A - list B) as columns with the suffix "_diff". Columns from list B have the same names as columns from list A, except they have the suffix "_pop". I'm basically trying to automate a process and avoid manually dividing each column from list A by the correct corresponding column from list B. I have tried experimenting with mutate and across() but I can't get it to work (I think at least one value needs to be fixed like a scalar).
Essentially, if list A contains "Column1A, Column2A, Column3A" and list B contains "Column1B", "Column2B" and "Column3B" (and all of these columns are present in the dataframe), I want to do "Column1A" - "Column1B", "Column2A" - "Column2B", etc., and to have that output saved as new columns with the "_diff" suffix.
This is what I tried, suspecting strongly that it wouldn't work (and it didn't work):
test <- test %>%
mutate(across(my_dataframe[,c(columns_list_A)] - my_dataframe[,c(columns_list_B)], .names="{col}_diff"))
Would the purrr package be a better fit for this problem? I'm not familiar with it, but if someone could point me to the right function I'll be grateful. Thank you very much!
CodePudding user response:
Here is one simple approach:
n=3
cbind(df, setNames(
as.data.frame(lapply(1:n, \(x) df[[paste0("Column",x,"B")]] - df[[paste0("Column",x,"A")]])),
paste0("diff",1:n))
)
Output:
Column1A Column2A Column3A Column1B Column2B Column3B diff1
1 -0.56047565 1.7150650 1.2240818 1.7869131 -1.0678237 -1.6866933 2.3473888
2 -0.23017749 0.4609162 0.3598138 0.4978505 -0.2179749 0.8377870 0.7280280
3 1.55870831 -1.2650612 0.4007715 -1.9666172 -1.0260044 0.1533731 -3.5253255
4 0.07050839 -0.6868529 0.1106827 0.7013559 -0.7288912 -1.1381369 0.6308475
5 0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393 1.2538149 -0.6020791
diff2 diff3
1 -2.78288869 -2.9107751
2 -0.67889112 0.4779732
3 0.23905679 -0.2473983
4 -0.04203838 -1.2488197
5 -0.17937730 1.8096561
Input:
set.seed(123)
df = data.frame(
Column1A = rnorm(5), Column2A=rnorm(5), Column3A=rnorm(5),
Column1B = rnorm(5), Column2B=rnorm(5), Column3B=rnorm(5)
)
CodePudding user response:
I have to double check the accuracy, but I think I may have figured it out. Basically, my dataframe contains columns listed in columna_list and columnb_list, and new_diff is a list containing the new column names I want to assign to this difference. If the list of columns can be easily extracted (as opposed to written out manually), which can be done in the case of my dataset, quite a bit of time can potentially be saved in automation.
columna_list <- c("columna1", "columna2", "columna3"...)
columnb_list <- c("columnb1", "columnb2", "columnb3"...)
new_diff <- c("diff1", "diff2", "diff3"...)
for(i in 1:length(columna_list)){
print(columna_list[i])
cola <- paste(new_diff[i])
colb <- paste(columna_list[i])
colc <- paste(columnb_list[i])
my_df[cola] <- my_df[[colb]] - my_df[[colc]]
}
CodePudding user response:
The reason people did not tackle this question is because of the bad formating of the question and with no reproducible example.
Using tidyverse you could do:
listA <- c("Column1A", "Column2A", "Column3A")
listB <- c("Column1B", "Column2B", "Column3B")
df %>%
mutate(across(all_of(listA), .names = "diff_{.col}") - across(all_of(listB)))
Column1A Column2A Column3A Column1B Column2B Column3B diff_Column1A diff_Column2A diff_Column3A
1 -0.56047565 1.7150650 1.2240818 1.7869131 -1.0678237 -1.6866933 -2.3473888 2.78288869 2.9107751
2 -0.23017749 0.4609162 0.3598138 0.4978505 -0.2179749 0.8377870 -0.7280280 0.67889112 -0.4779732
3 1.55870831 -1.2650612 0.4007715 -1.9666172 -1.0260044 0.1533731 3.5253255 -0.23905679 0.2473983
4 0.07050839 -0.6868529 0.1106827 0.7013559 -0.7288912 -1.1381369 -0.6308475 0.04203838 1.2488197
5 0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393 1.2538149 0.6020791 0.17937730 -1.8096561