Home > database >  Combining values of different columns consecutively in a pandas dataframe
Combining values of different columns consecutively in a pandas dataframe

Time:02-18

Is there a way to consecutively combine the values of columns in a pandas dataframe to reduce the amount of missing data? Preferably in a "hierarchical" order (like: "keep all values of column A, if something is missing then use the value from column B, if something is still missing use the value from column C and so on)

It seems like a pretty straightforward task, but I have not been able do it. I have looked into the workings of the pandas-functions: join, merge, combine and combine_first, and I assume specific setup for one of these functions or a combination of these would do the trick, but I have no idea, how to execute this, so any suggestions or help would be very much appreciated.

An example code of the desired transformation:

import pandas as pd
import numpy as np 

df = pd.DataFrame({'A': [1,2,3,4,np.nan, np.nan, np.nan, np.nan],
                   'B': [np.nan,np.nan,0.3,4,5, 0.6, 0.7, np.nan],
                   'C': [0.01, 0.2, 0.3, 0.04, 0.5, 0.06, 7, 8]})

print('Input')
print(df)


xx = pd.DataFrame({'A': [1,2,3,4,5,0.6,0.7,8],
                   'B': [np.nan,np.nan,0.3,4,5, 0.6, 0.7, np.nan],
                   'C': [0.01, 0.2, 0.3, 0.04, 0.5, 0.06, 7, 8]})


print('\nDesired output:')
print(xx)

CodePudding user response:

We can use bfill to back fill the values along columns axis

df['A'] = df.bfill(axis=1).iloc[:, 0]

     A    B     C
0  1.0  NaN  0.01
1  2.0  NaN  0.20
2  3.0  0.3  0.30
3  4.0  4.0  0.04
4  5.0  5.0  0.50
5  0.6  0.6  0.06
6  0.7  0.7  7.00
7  8.0  NaN  8.00

CodePudding user response:

Not sure if there are any differences between my answer and @Shubham Sharma, but here is another way of doing so

df["A"] = df.fillna(method='bfill', axis=1)["A"]

CodePudding user response:

You can use fillna repeatedly:

df['A'] = df['A'].fillna(df['B']).fillna(df['C'])
  • Related