I am trying to compare the difference for the before and after of filling the NA values and then using describe() method.
for example, first dataframe:
idx A B
1 NA 5
2 NA 4
3 3 3
4 5 NA
5 6 7
after fill na
idx A B
1 3 5
2 3 4
3 3 3
4 3 3
5 6 7
i wish to compare describe between the difference in data after filling the NA values, with random data combination. Original dataframe has 80k rows with around 30% na of different columns(total of 30 columns)
ideal result: results with no change should show 0 results with change will show the difference (ie, mean=2 (from 3 to 5))
attempt 1: subtract them manually with a method, but it is not as clean as I would like
attempt 2: create two dataframe, and use compare, and then describe, can this be cleaned up ?
Many thanks.
CodePudding user response:
df1.describe() - df2.describe()
would produce this on your dataframes:
idx A B
count 0.0 -2.000000 -1.000000
mean 0.0 1.066667 0.350000
std 0.0 0.185884 0.034505
min 0.0 0.000000 0.000000
25% 0.0 1.000000 0.750000
50% 0.0 2.000000 0.500000
75% 0.0 2.500000 0.500000
max 0.0 0.000000 0.000000