Given a dataframe df as follows:
date value pred1 pred2 pred3
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3631.411799 NaN NaN
4 2021-10-17 3572.3662 3637.792491 3677.656329 NaN
5 2021-10-24 3582.6036 3678.800911 3707.926324 3712.669694
6 2021-10-31 3547.3361 3731.336899 3735.695071 3733.021293
7 2021-11-7 3491.5677 3749.422639 3743.507496 NaN
8 2021-11-14 3539.1002 3737.851787 NaN NaN
9 2021-11-21 3560.3734 NaN NaN NaN
Suppose for columns pred1
, pred2
and pred3
, except that all these columns are NaN
s, if it has one non-null
value, I will populate the other columns with that value, and if it has two or more non-null
values, I will use the average of non-null
values to fill null values of other columns.
For example, for the row with date 2021-10-10
, I will populate pred2
and pred3
with the value of pred1
; for the row with date 2021-10-17
, I will populate pred3
with the mean of pred1
and pred2
on that day; The same logic will apply to 2021-11-7
and 2021-11-14
.
The expected result:
date value pred1 pred2 pred3
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3631.411799 3631.411799 3631.411799
4 2021-10-17 3572.3662 3637.792491 3677.656329 3657.724410
5 2021-10-24 3582.6036 3678.800911 3707.926324 3712.669694
6 2021-10-31 3547.3361 3731.336899 3735.695071 3733.021293
7 2021-11-7 3491.5677 3749.422639 3743.507496 2.957571
8 2021-11-14 3539.1002 3737.851787 3737.851787 3737.851787
9 2021-11-21 3560.3734 NaN NaN NaN
Anyone could help to deal this issue? Thanks.
CodePudding user response:
df.iloc[:,2:].apply(lambda x: x.fillna(x.mean()), axis = 1)
pred1 pred2 pred3
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 3631.411799 3631.411799 3631.411799
4 3637.792491 3677.656329 3657.724410
5 3678.800911 3707.926324 3712.669694
6 3731.336899 3735.695071 3733.021293
7 3749.422639 3743.507496 3746.465068
8 3737.851787 3737.851787 3737.851787
9 NaN NaN NaN