I have a dataframe with a subset of columns with the form:
col_A col_B col_C
0 NaN 1.0 NaN
1 NaN NaN NaN
2 NaN NaN 2.0
3 3.0 NaN 4.0
I want to create a series of the notnull values for each row. For rows with multiple not null values, I want either the average or first value e.g.
new_col
0 1.0
1 NaN
2 2.0
3 3.5
I'll eventually want to add this back to the original dataframe as a separate column so I need to persist rows with all NaN values by forward filling them e.g.
new_col
0 1.0
1 1.0
2 2.0
3 3.5
I know how to determine if there is a not null value in the dataframe, but I don't know how to select for it:
df[['col_A', 'col_B', 'col_C']].count(axis=1) >= 1
CodePudding user response:
You can use:
df.mean(axis=1).ffill()
Or to restrict the columns:
df[['col_A', 'col_B', 'col_C']].mean(axis=1).ffill()
Output:
0 1.0
1 1.0
2 2.0
3 3.5
dtype: float64