Home > other >  In-place update in pandas: update the value of the cell based on a condition
In-place update in pandas: update the value of the cell based on a condition

Time:07-06

      DOB        Name      
0   1956-10-30   Anna
1   1993-03-21   Jerry        
2   2001-09-09   Peter     
3   1993-01-15   Anna   
4   1999-05-02   James 
5   1962-12-17   Jerry
6   1972-05-04   Kate

In the dataframe similar to the one above where I have duplicate names. So I am want to add a suffix '_0' to the name if DOB is before 1990 and a duplicate name.

I am expecting a result like this

      DOB        Name      
0   1956-10-30   Anna_0
1   1993-03-21   Jerry        
2   2001-09-09   Peter     
3   1993-01-15   Anna   
4   1999-05-02   James
5   1962-12-17   Jerry_0
6   1972-05-04   Kate

I am using the following

df['Name'] = df[(df['DOB'] < '01-01-1990') & (df['Name'].isin(['Anna','Jerry']))].Name.apply(lambda x: x '_0')

But I am getting this result

      DOB        Name      
0   1956-10-30   Anna_0
1   1993-03-21   NaN
2   2001-09-09   NaN     
3   1993-01-15   NaN   
4   1999-05-02   NaN
5   1962-12-17   Jerry_0
6   1972-05-04   NaN

How can I add a suffix to the Name which is a duplicate and have to be born before 1990.

CodePudding user response:

Problem in your df['Name'] = df[(df['DOB'] < '01-01-1990') & (df['Name'].isin(['Anna','Jerry']))].Name.apply(lambda x: x '_0') is that df[(df['DOB'] < '01-01-1990') & (df['Name'].isin(['Anna','Jerry']))] is a filtered dataframe whose rows are less than the original. When you assign it back, the not filtered rows doesn't have corresponding value in the filtered dataframe, so it becomes NaN.

You can try mask instead

m = (df['DOB'] < '1990-01-01') & df['Name'].duplicated(keep=False)

df['Name'] = df['Name'].mask(m, df['Name'] '_0')

CodePudding user response:

You can use masks and boolean indexing:

# is the year before 1990?
m1 = pd.to_datetime(df['DOB']).dt.year.lt(1990)
# is the name duplicated?
m2 = df['Name'].duplicated(keep=False)

# if both conditions are True, add '_0' to the name
df.loc[m1&m2, 'Name']  = '_0'

output:

          DOB     Name
0  1956-10-30   Anna_0
1  1993-03-21    Jerry
2  2001-09-09    Peter
3  1993-01-15     Anna
4  1999-05-02    James
5  1962-12-17  Jerry_0
6  1972-05-04     Kate
  • Related