Fill duplicates with missing value after grouping with some logic-CodePudding

I have a dataframe, I need to take off the duplicates of ticket_id if the owner_type is the same, and if not, pick 'm' over 's', if no value is picket then a NaN is returned:

data = pd.DataFrame({'owner_type':['m','m','m','s','s','m','s','s'],'ticket_id':[1,1,2,2,3,3,4,4]})

'|    | owner_type   |   ticket_id |
 |---:|:-------------|------------:|
 |  0 | m            |           1 |
 |  1 | m            |           1 |
 |  2 | m            |           2 |
 |  3 | s            |           2 |
 |  4 | s            |           3 |
 |  5 | m            |           3 |
 |  6 | s            |           4 |
 |  7 | s            |           4 |'

Should give back:

'|    | owner_type   |   ticket_id |
 |---:|:-------------|------------:|
 |  0 | m            |         NaN |
 |  1 | m            |         NaN |
 |  2 | m            |           2 |
 |  3 | s            |         NaN |
 |  4 | s            |         NaN |
 |  5 | m            |           3 |
 |  6 | s            |         NaN |
 |  7 | s            |         NaN |'

Pseudo code would be like : If ticket_id is duplicated, look at owner_type, if owner_type has mover than one value, return value of 'm' and NaN for 's'.

My attempt

data.groupby('ticket_id').apply(lambda x: x['owner_type'] if len(x) < 2 else NaN)

Not working

CodePudding user response：

Try this:

m = ~df.duplicated(keep=False) & df['owner_type'].eq('m')
df['ticket_id'].where(m)

Output:

0    NaN
1    NaN
2    2.0
3    NaN
4    NaN
5    3.0
6    NaN
7    NaN