Home > Mobile >  How to use pandas to rename rows when they are the same in a column?
How to use pandas to rename rows when they are the same in a column?

Time:12-31

According to:

enter image description here

I want to use pandas to rename Hospital when a row with the same value in the Hospital column has a different value in the GeneralRepresentation column. And when a row with the same value in the Hospital column has the same value in the GeneralRepresentation column, no renaming is done for Hospital. And for hospitals without GeneralRepresentation, keep the name of the hospital the same.

The effect I want is shown below:

enter image description here

When I use Beny's code in enter image description here

But what I want is for the name of the hospital to remain the same when a hospital does not have a GeneralRepresentation, the effect is like the second picture, how do I modify this code to fulfil my requirement?

CodePudding user response:

Problem is with missing values, for misisng values is factorize set to -1, so if add 1 get 0 for last 2 rows, in my solution is replaced NaN to empty strings before groupby for prevent it:

g = df.fillna({'GeneralRepresentation':''}).groupby('Hospital')['GeneralRepresentation']
s1 = g.transform(lambda x :x.factorize()[0] 1).astype(str)
s2 = g.transform('nunique')
df['Hospital'] = np.where(s2==1, df['Hospital'], df['Hospital']   '_'   s1)
print (df)
  Hospital GeneralRepresentation
0        a                     a
1      b_1                     b
2      b_2                     c
3      c_1                     d
4      c_2                     e
5        d                   NaN
6        t                   NaN

CodePudding user response:

Use np.select(listof conditions, list of choices, alternative)

a=~(df['GeneralRepresentation'].str.contains('\w'))
b= ((df['GeneralRepresentation'].str.contains('\w'))&(df['Hospital'].duplicated(keep=False))&(df['GeneralRepresentation'].duplicated(keep=False)))

df['Hospital'] np.select([a,b],[df['Hospital'] '_' (df.groupby('Hospital').cumcount() 1).astype(str),''],df['Hospital'])
  • Related