Conditionally update dataframe column if character exists in another column-CodePudding

I have a dataframe which consists of two columns, full name and last name. Sometimes, the last name column is not filled properly. In such cases, the last name would be found as the last word in the full name column between parenthesis. I would like to update my last name column for those cases where parenthesis are found to be equal to the word between parenthesis.

Code

import pandas as pd
df = pd.DataFrame({
        'full':['bob john smith','sam alan (james)','zack joe mac', 'alan (gracie) jacob (arnold)'],
        'last': ['ross', '-', 'mac', '-']
        })
result_to_be = pd.DataFrame({
        'full':['bob john smith','sam alan (james)','zack joe mac', 'alan (gracie) jacob (arnold)'],
        'last': ['ross', 'james', 'mac', 'arnold']
        })
print(df)
print(result_to_be)

I have tried to implement the contains function to be used as a mask but it seems to be messing the check regex when checking if it contains ')' or '(' characters

df['full'].str.contains(')')

The error it shows is

re.error: unbalanced parenthesis at position 0

CodePudding user response：

You can use .str.findall to get the value between the parentheses and df.loc to assign that where last is -:

df.loc[df['last'] == '-', 'last'] = df['full'].str.findall('\((. ?)\)').str[-1]

Output:

>>> df
                           full    last
0  bob john smith                ross  
1  sam alan (james)              james 
2  zack joe mac                  mac   
3  alan (gracie) jacob (arnold)  arnold

CodePudding user response：

For a slightly different syntax, you could also use extract

df.loc[df['last'] == '-', 'last'] = df['full'].str.extract('.*\((.*)\)', expand=False)

Output:

                           full    last
0                bob john smith    ross
1              sam alan (james)   james
2                  zack joe mac     mac
3  alan (gracie) jacob (arnold)  arnold