Home > Enterprise >  Calculating RMSE in corresponding columns from two different DataFrames
Calculating RMSE in corresponding columns from two different DataFrames

Time:09-30

Let's say that I have the following two dataframes:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

I would like to generate a dataframe (df3) that contains the same column names but with the RMSE value between the corresponding columns of the two dataframes.

I know that RMSE can be calculated as in the following example, but am not sure of an efficient way to extend this to the DataFrames (my actual DataFrames that I want to extend this example to have many columns):

from sklearn.metrics import mean_squared_error
import math
y_actual = [1,2,3,4,5]
y_predicted = [1.6,2.5,2.9,3,4.1]
 
MSE = mean_squared_error(y_actual, y_predicted)
RMSE = math.sqrt(MSE)

CodePudding user response:

I think this may be what you are looking for:

First find the columns that exist in both frames

s = df1.columns.intersection(df2.columns)

Then find the RMSE for each of the intersecting columns

df1[s].apply(lambda x: math.sqrt(mean_squared_error(x, df2[x.name])))

Result

A    1.348552
B    1.360788
C    1.325903
D    1.351737
  • Related