Home > database >  Pandas DataFrame add column with text from two other columns depending on condition
Pandas DataFrame add column with text from two other columns depending on condition

Time:12-27

How do I add a column with header MergeName to a Pandas DataFrame that has the text from column ShortName, but if ShortName is "None", then the MergeName value should equal the Plaintiffs column value?

This is the Pandas DataFrame data:

      Plaintiffs Gender ShortName
0           None   None      None
1           None   None      None
2    Donald Duck      M      None
3   Minnie Mouse      F    Minnie
4           None   None      None
5       John Doe      M       Doe
6           None   None      None
7           None   None      None
8           None   None      None
9           None   None      None
10          None   None      None

Thanks!

I've tried so many different things and nothing seems to work. Usually the result is only all the data from the else condition is added to the MergeName column including "None" values. Code I've tried include:

PlaintiffsTbl['MergeName'] = np.where(PlaintiffsTbl['ShortName'] is None, PlaintiffsTbl['Plaintiffs'],  PlaintiffsTbl['ShortName'])
PlaintiffsTbl['MergeName'] = PlaintiffsTbl['ShortName']
PlaintiffsTbl.loc[PlaintiffsTbl['MergeName'] == None, 'MergeName'] = PlaintiffsTbl['Plaintiffs']
PlaintiffsTbl['MergeName'] = [PlaintiffsTbl['Plaintiffs'] if PlaintiffsTbl['ShortName'] is None else PlaintiffsTbl['ShortName']]

Thank you Amir Hossein Shahdaei! This code does what I was looking for:

PlaintiffsTbl['MergeName'] = PlaintiffsTbl['ShortName']
PlaintiffsTbl['MergeName'] = PlaintiffsTbl['MergeName'].fillna(PlaintiffsTbl['Plaintiffs'])

CodePudding user response:

You can use .fillna function and after making MergeName from ShortName col fill null values of it with MergeName col

df = pd.DataFrame(
    data = [
        ['a', None],
        ['b', 1],
        [None, 2],
        [None, None],], 
    columns = ['Plaintiffs', 'ShortName']
)
df['MergeName'] = df['ShortName']
df['MergeName'] = df['MergeName'].fillna(df['Plaintiffs'])
df

    Plaintiffs  ShortName   MergeName
0   a           NaN         a
1   b           1.0         1.0
2   None        2.0         2.0
3   None        NaN         None

CodePudding user response:

Example

data = [[None, None, None, None], [None, None, None, None],
        ['Donald Duck', 'M', None, 'Donald Duck'], ['Minnie Mouse', 'F', 'Minnie', 'Minnie'],
        [None, None, None, None], ['John Doe', 'M', 'Doe', 'Doe']]
df = pd.DataFrame(data, columns=['Plaintiffs', 'Gender', 'ShortName', 'MergeName'])

df

Plaintiffs      Gender  ShortName
0   None        None    None    
1   None        None    None    
2   Donald Duck M       None    
3   Minnie MouseF       Minnie  
4   None        None    None    
5   John Doe    M       Doe    

Code

df['MergeName'] = df['ShortName'].fillna(df['Plaintiffs'])

df

    Plaintiffs  Gender  ShortName   MergeName
0   None        None    None    None
1   None        None    None    None
2   Donald Duck M       None    Donald Duck
3   Minnie MouseF       Minnie  Minnie
4   None        None    None    None
5   John Doe    M       Doe     Doe

CodePudding user response:

You can use the np.where like this, not like your first trying:

PlaintiffsTbl['MergeName'] = np.where(PlaintiffsTbl['ShortName'], PlaintiffsTbl['ShortName'], PlaintiffsTbl['Plaintiffs'])

For example, the full code is as follows:

import pandas as pd
import numpy as np

PlaintiffsTbl = pd.DataFrame({
    'Plaintiffs': [None, None, 'Donald Duck', 'Minnie Mouse', None, 'John Doe', None, None, None, None, None],
    'Gender': [None, None, 'M', 'F', None, 'M', None, None, None, None, None],
    'ShortName': [None, None, None, 'Minnie', None, 'Doe', None, None, None, None, None],
})

print(PlaintiffsTbl)
"""
      Plaintiffs Gender ShortName
0           None   None      None
1           None   None      None
2    Donald Duck      M      None
3   Minnie Mouse      F    Minnie
4           None   None      None
5       John Doe      M       Doe
6           None   None      None
7           None   None      None
8           None   None      None
9           None   None      None
10          None   None      None
"""

PlaintiffsTbl['MergeName'] = np.where(PlaintiffsTbl['ShortName'], PlaintiffsTbl['ShortName'], PlaintiffsTbl['Plaintiffs'])

print(PlaintiffsTbl)
"""
      Plaintiffs Gender ShortName    MergeName
0           None   None      None         None
1           None   None      None         None
2    Donald Duck      M      None  Donald Duck
3   Minnie Mouse      F    Minnie       Minnie
4           None   None      None         None
5       John Doe      M       Doe          Doe
6           None   None      None         None
7           None   None      None         None
8           None   None      None         None
9           None   None      None         None
10          None   None      None         None
"""

For more information about np.where, see https://numpy.org/doc/stable/reference/generated/numpy.where.html

CodePudding user response:

the inspiration

import pandas as pd
import numpy as np
di = {
    'Plaintiffs': ['Donald Duck', 'Minnie Mouse', None],
    'ShortName': [None, 'Minnie', None]
    }
d = pd.DataFrame(di)
d

yields

     Plaintiffs ShortName
0   Donald Duck      None
1  Minnie Mouse    Minnie
2          None      None

if it's just a simply one branching

cond = (d['ShortName'].isna()) & (d['Plaintiffs'].notna())
d['MergeName'] = np.where(cond, d['Plaintiffs'], d['ShortName'])
d

or this (suitable with more conditions and choices)

conditions = [
    (d['ShortName'].isna()) & (d['Plaintiffs'].notna())
]
choices = [d['Plaintiffs']]
d['MergeName'] = np.select(conditions, choices, default=d['ShortName'])
d

yields the same

     Plaintiffs ShortName    MergeName
0   Donald Duck      None  Donald Duck
1  Minnie Mouse    Minnie       Minnie
2          None      None         None

if it has more than one choice, just add into the list

conditions = [
    (d['ShortName'].isna()) & (d['Plaintiffs'].notna()),
    d['ShortName'].notna()
]
choices = [d['Plaintiffs'], d['ShortName']]
d['MergeName'] = np.select(conditions, choices)
d

yields

     Plaintiffs ShortName    MergeName
0   Donald Duck      None  Donald Duck
1  Minnie Mouse    Minnie       Minnie
2          None      None            0
  • Related