Is there a way to consecutively combine the values of columns in a pandas dataframe to reduce the amount of missing data? Preferably in a "hierarchical" order (like: "keep all values of column A, if something is missing then use the value from column B, if something is still missing use the value from column C and so on)
It seems like a pretty straightforward task, but I have not been able do it. I have looked into the workings of the pandas-functions: join, merge, combine and combine_first, and I assume specific setup for one of these functions or a combination of these would do the trick, but I have no idea, how to execute this, so any suggestions or help would be very much appreciated.
An example code of the desired transformation:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1,2,3,4,np.nan, np.nan, np.nan, np.nan],
'B': [np.nan,np.nan,0.3,4,5, 0.6, 0.7, np.nan],
'C': [0.01, 0.2, 0.3, 0.04, 0.5, 0.06, 7, 8]})
print('Input')
print(df)
xx = pd.DataFrame({'A': [1,2,3,4,5,0.6,0.7,8],
'B': [np.nan,np.nan,0.3,4,5, 0.6, 0.7, np.nan],
'C': [0.01, 0.2, 0.3, 0.04, 0.5, 0.06, 7, 8]})
print('\nDesired output:')
print(xx)
CodePudding user response:
We can use bfill
to back fill the values along columns axis
df['A'] = df.bfill(axis=1).iloc[:, 0]
A B C
0 1.0 NaN 0.01
1 2.0 NaN 0.20
2 3.0 0.3 0.30
3 4.0 4.0 0.04
4 5.0 5.0 0.50
5 0.6 0.6 0.06
6 0.7 0.7 7.00
7 8.0 NaN 8.00
CodePudding user response:
Not sure if there are any differences between my answer and @Shubham Sharma, but here is another way of doing so
df["A"] = df.fillna(method='bfill', axis=1)["A"]
CodePudding user response:
You can use fillna
repeatedly:
df['A'] = df['A'].fillna(df['B']).fillna(df['C'])