Home > Software engineering >  How to substract columns from two datasets by colname
How to substract columns from two datasets by colname

Time:02-23

suppose you have two datasets, and these data frames have the same columns, same row number, just the order where the columns appear is different.

dataset a are predicted values from my model, while dataset b contains the real values from these variables.

I want to get a new dataset that has the a$i - b$i computation, like FYFF from dataset a minus FYFF from dataset B. (a$FYFF-b$FYFF)

I don't know how (or if I have) to make a loop that matches the column names and subtracts them.

Thanks in advance!

Data:

> dput(a)
structure(list(FYFF = c(5.62481291704216, 5.77021269533357, 5.80660266666805, 
5.89556030216938, 5.81687106929874, 5.89645562124814, 5.88639911374851, 
5.90687872475339, 5.95506281594889, 6.05004047596607, 6.11439503144994, 
6.2045773479442), IP = c(0.00550691992815247, 0.00592967603768478, 
0.00496743469475157, 0.00439395197656857, 0.00436417085033269, 
0.00368796833846484, 0.00375828785751239, 0.00379577545756551, 
0.00347980689447873, 0.00416191362799741, 0.00400028831069191, 
0.0039837438592708), PUNEW = c(0.00248906763444025, 0.00289206479346909, 
0.00356897184657621, 0.00315713460136047, 0.00374885320757934, 
0.00320757113077844, 0.00308236691113797, 0.00322111379093545, 
0.00330962741169567, 0.00332405808527479, 0.00345482092419552, 
0.00361550086806829)), class = "data.frame", row.names = c(NA, 
-12L))
> dput(b)
structure(list(IP = c(-0.0019063187, -0.0010909588, 0.0055955858, 
0.0050583338, 0.0041930195, -0.0029113572, -0.0058143629, 0.01612572, 
0.0074449866, 0.0042460103, 0.011474407, 0.021971466), PUNEW = c(0, 
0.0025031302, 0.0024968802, 0.0049751346, 0.0049505052, 0.0024660925, 
0.0024600258, 0.002453989, 0.0024479816, 0.0024420037, 0.0024360548, 
0.0024301349), FYFF = c(3.72, 3.71, 4.15, 4.63, 4.91, 5.31, 5.57, 
5.55, 5.2, 4.91, 4.14, 3.5)), row.names = c(NA, -12L), class = "data.frame")

CodePudding user response:

A simple solution would be to just sort b according to a and subtract the two data frames:

#Dplyr
new_data <- a - dplyr::select(b, names(a))
#Base R
new_data <- a - b[names(a)]
  • Related