I want to use pandas to rename Hospital when a row with the same value in the Hospital column has a different value in the GeneralRepresentation column. And when a row with the same value in the Hospital column has the same value in the GeneralRepresentation column, no renaming is done for Hospital.
The effect I want is shown below:
CodePudding user response:
Just do need to change the logic, you need the groupby
cumcount
the unique
value
g = df.groupby('Hospital')['GeneralRepresentation']
s1 = g.transform(lambda x :x.factorize()[0] 1).astype(str)
s2 = g.transform('nunique')
df['Hospital'] = np.where(s2==1, df['Hospital'], df['Hospital'] '_' s1,)
df
Hospital GeneralRepresentation
0 a a
1 b_1 b
2 b_2 c
3 c_1 d
4 c_2 e
5 d f
6 d f
CodePudding user response:
leverage duplicated to create boolean. Pass the booleans into np.where(condition, if condition true, if condition is false)
. cumcount will help generate incremental consecutives which when turned into strings can be concatenated to original name
df['Hospital']=np.where(((df['Hospital'].duplicated(keep=False))&(df['GeneralRepresentation'].duplicated(keep=False))),df['Hospital'] '_' (df.groupby('Hospital').cumcount() 1).astype(str),df['Hospital'])
CodePudding user response:
You can use:
dup = ~df.duplicated(keep=False)
g_count = df.groupby("Hospital").cumcount() 1
count = df.groupby("Hospital")['GeneralRepresentation'].transform('count')
df['Hospital'] = np.where((dup) & (count>1), df['Hospital'] '_' g_count.astype(str), df['Hospital'])
OUTPUT
Hospital GeneralRepresentation
0 UMC a
1 MGH_1 b
2 MGH_2 j
3 NMH_1 o
4 NMH_2 a
5 MSH d
6 MSH d