Home > Software engineering >  Pandas - Append Duplicates with the initial of the string from another column
Pandas - Append Duplicates with the initial of the string from another column

Time:04-02

I want to replace duplicate first names in a dataframe with the firstname ' ' the initial of the last name.

Last Name   First Name  Value
Simpson     Bart        10
Monroe      Lisa        20
Colbert     Bart        15

becomes

Last Name   First Name  Value
Simpson     Bart S      10
Monroe      Lisa        20
Colbert     Bart C      15

I've done that sofar

df.loc[df.duplicated(['First Name']), 'First Name']  = " X"

It's not working for the first duplicate and gives a warning:

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:1773: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:1773: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

CodePudding user response:

Create a mask using pd.Series.duplicated. Then extract the first character from the Last Name using pd.Series.str and append it at the end.

m = df['First Name'].duplicated(keep=False) # Mask for all duplicated values

df.loc[m, 'First Name']  = (" "   df.loc[m, "Last Name"].str[0])
#                      ^^^^                              ^^^^^^
#                  appending at the end               Extract the first character

print(df)

#   Last Name First Name  Value
# 0   Simpson     Bart S     10
# 1    Monroe       Lisa     20
# 2   Colbert     Bart C     15
  • Related