Home > Blockchain >  Need to remove nan from a column when its concatenated with strings using pandas
Need to remove nan from a column when its concatenated with strings using pandas

Time:03-18

My data looks like this:

First_Name Middle_Name Last_Name Full_Name (Header)
John       R           Rovin     John R Rovin
Marano                 Justine   Marano nan Justine
David                  Rose      David nan  Rose
                                 nan nan nan
Robert                           Robert nan nan

I am trying to trim nan from Full_Name column to just get whatever name it possibly contains as is. When I am trying to use Fillna(' ') , its not helping me to trim but its completely removing the column content. My final DF should look something like this:

First_Name Middle_Name Last_Name Full_Name (Header)

John       R           Rovin     John R Rovin
Marano                 Justine   Marano Justine
David                  Rose      David Rose
                                 
Robert                           Robert

I am heavily dependent for most of the operations on Pandas. So is there any way using pandas I can solve this problem?

CodePudding user response:

You can use:

cols = ['First_Name', 'Middle_Name', 'Last_Name']
df['Full_Name'] = df[cols].apply(lambda x: ' '.join(i for i in x if pd.notna(i)), axis=1)
print(df)

# Output
  First_Name Middle_Name Last_Name       Full_Name
0       John           R     Rovin    John R Rovin
1     Marano         NaN   Justine  Marano Justine
2      David         NaN      Rose      David Rose
3        NaN         NaN       NaN                
4     Robert         NaN       NaN          Robert

Setup a MRE

import pandas as pd
import numpy as np

data = {'First_Name': ['John', 'Marano', 'David', np.nan, 'Robert'],
        'Middle_Name': ['R', np.nan, np.nan, np.nan, np.nan],
        'Last_Name': ['Rovin', 'Justine', 'Rose', np.nan, np.nan]}
df = pd.DataFrame(data)

CodePudding user response:

if nan always appear the same in Full_Name column you can use this:

df['Full_Name (Header)'] = df['Full_Name (Header)'].str.replace('nan', ' ')

CodePudding user response:

Use Series.dropna for remove misisng values before join:

cols = ['First_Name', 'Middle_Name', 'Last_Name']
df['Full_Name'] = df[cols].apply(lambda x: ' '.join(x.dropna()), axis=1)
print(df)

  First_Name Middle_Name Last_Name       Full_Name
0       John           R     Rovin    John R Rovin
1     Marano         NaN   Justine  Marano Justine
2      David         NaN      Rose      David Rose
3        NaN         NaN       NaN                
4     Robert         NaN       NaN          Robert
  • Related