Home > database >  How to join columns in a pandas dataframe, with empty values, considering priority?
How to join columns in a pandas dataframe, with empty values, considering priority?

Time:07-14

I'm doing a university job with a pandas dataframe, as below:

      import numpy as np
      import pandas as pd

      df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

      print(df)

      col1    sensor_name   col2    col3
      NaN      water         NaN    NaN
      11.2     strain        NaN    2.0
      10.0     fog           30.0   30.0
      NaN      water         40.0   44.0
      1000.0   fog           NaN    NaN

I would like to join columns 1, 2 and 3, to avoid the NaN values. The priority would be the value in col1, if any. Then the priority would be 'col2' and finally 'col3'.

I tried to do the following code:

      df['new_column'] = df.ffill(axis=1)['col3']

The output is:

      col1     sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       water
       11.2     strain          NaN     2.0       2.0
       10.0     fog             30.0    30.0      30.0
       NaN      water           40.0    44.0      44.0
       NaN      fog             NaN     NaN       fog

However, the desired output is:

       col1    sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       NaN
       11.2     strain          NaN     2.0       11.2
       10.0     fog             30.0    30.0      10.0
       NaN      water           40.0    44.0      40.0
       1000.0   fog             NaN     NaN       1000.0

CodePudding user response:

One of the many ways you can achieve is using pandas.apply function.

import numpy as np
import pandas as pd

df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

print(df)

def apply_func(row):
  if not pd.isna(row['col1']):
    return row['col1']
  elif not pd.isna(row['col2']):
    return row['col2']
  return row["col3"]

df["new_cols"]=df.apply(apply_func,axis=1)
print(df)

Output:

     col1 sensor_name  col2  col3  new_cols
0     NaN       water   NaN   NaN       NaN
1    11.2      strain   NaN   2.0      11.2
2    10.0         fog  30.0  30.0      10.0
3     NaN       water  40.0  44.0      40.0
4  1000.0         fog   NaN   NaN    1000.0

CodePudding user response:

try this:

f = df.filter(regex='col\d')
res = df.assign(new_column=f.where(f.notnull().cumsum(axis=1).eq(1)).max(axis=1))
print(res)
>>>
    col1    sensor_name col2    col3    new_column
0   NaN     water       NaN     NaN     NaN
1   11.2    strain      NaN     2.0     11.2
2   10.0    fog         30.0    30.0    10.0
3   NaN     water       40.0    44.0    40.0
4   1000.0  fog         NaN     NaN     1000.0

CodePudding user response:

df['new_column'] = df['col1']
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,2]
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,3]

You could create a for loop if you had more columns to consider and/or wanted it to be more automated.

CodePudding user response:

Filter the column

df['new'] = df.filter(like='col').bfill(1)['col1']
df
Out[324]: 
     col1 sensor_name  col2  col3     new
0     NaN       water   NaN   NaN     NaN
1    11.2      strain   NaN   2.0    11.2
2    10.0         fog  30.0  30.0    10.0
3     NaN       water  40.0  44.0    40.0
4  1000.0         fog   NaN   NaN  1000.0
  • Related