I want to replace duplicate first names in a dataframe with the firstname ' '
the initial of the last name.
Last Name First Name Value
Simpson Bart 10
Monroe Lisa 20
Colbert Bart 15
becomes
Last Name First Name Value
Simpson Bart S 10
Monroe Lisa 20
Colbert Bart C 15
I've done that sofar
df.loc[df.duplicated(['First Name']), 'First Name'] = " X"
It's not working for the first duplicate and gives a warning:
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:1773: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:1773: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
CodePudding user response:
Create a mask using pd.Series.duplicated
. Then extract the first character from the Last Name
using pd.Series.str
and append it at the end.
m = df['First Name'].duplicated(keep=False) # Mask for all duplicated values
df.loc[m, 'First Name'] = (" " df.loc[m, "Last Name"].str[0])
# ^^^^ ^^^^^^
# appending at the end Extract the first character
print(df)
# Last Name First Name Value
# 0 Simpson Bart S 10
# 1 Monroe Lisa 20
# 2 Colbert Bart C 15