Home > database >  Comparing the before and after of filling np.nan values on a dataframe level using pandas describe()
Comparing the before and after of filling np.nan values on a dataframe level using pandas describe()

Time:08-28

I am trying to compare the difference for the before and after of filling the NA values and then using describe() method.

for example, first dataframe:

idx A   B
1   NA  5
2   NA  4
3   3   3
4   5   NA
5   6   7

after fill na

idx A   B
1   3   5
2   3   4
3   3   3
4   3   3
5   6   7

i wish to compare describe between the difference in data after filling the NA values, with random data combination. Original dataframe has 80k rows with around 30% na of different columns(total of 30 columns)

ideal result: results with no change should show 0 results with change will show the difference (ie, mean=2 (from 3 to 5))

attempt 1: subtract them manually with a method, but it is not as clean as I would like

attempt 2: create two dataframe, and use compare, and then describe, can this be cleaned up ?

Many thanks.

CodePudding user response:

df1.describe() - df2.describe()

would produce this on your dataframes:

       idx         A         B
count  0.0 -2.000000 -1.000000
mean   0.0  1.066667  0.350000
std    0.0  0.185884  0.034505
min    0.0  0.000000  0.000000
25%    0.0  1.000000  0.750000
50%    0.0  2.000000  0.500000
75%    0.0  2.500000  0.500000
max    0.0  0.000000  0.000000
  • Related