Home > Enterprise >  Create a Boolean column for unique rows in a grouped data frame
Create a Boolean column for unique rows in a grouped data frame

Time:08-15

I have a grouped data frame df_grouped, I would like to create a new boolean column df_grouped["Unique"] where for each subset of grouping, this column is True if the values of location is unique within the grouping & False if it's not unique.

dataset = {
    'ID': ['One', 'One', 'One', 'Five', 'Five','Five','Four'],
    'Day': [2, 2, 2, 1, 1,1,0],
    'Location': ['London', 'London', 'Paris', 'London', 'Paris','Paris','Berlin']}

df = pd.DataFrame(dataset)
df_grouped = df.groupby(['Name','Day'])

Expected output for the unique column:

'Unique': [False, False, True, True, False, False, True]

CodePudding user response:

Use DataFrame.duplicated with keep=False and inverted mask by ~:

df['Unique'] = ~df.duplicated(['ID','Day', 'Location'], keep=False)
print (df)
     ID  Day Location  Unique
0   One    2   London   False
1   One    2   London   False
2   One    2    Paris    True
3  Five    1   London    True
4  Five    1    Paris   False
5  Five    1    Paris   False
6  Four    0   Berlin    True
  • Related