I have a column with lists in each row, I would like to check if any of the rows has a duplicate.
updated_df.groupby('value 1')['value 2'].apply(list).reset_index(name='value 2')
| Vaue 1| Vaue 2|
|:------|:------|
|25 |[22,22]|
|265 | [4] |
|257 |[1,1,7]|
My intention is to create an adjacent column which contains 'True' or some other indicator to see if there are duplicates and no if not.
Thanks!
CodePudding user response:
Assuming that you created the dataframe with the statement you posted. You can do a similar thing to check if a duplicate is in the group and save it as a column:
# Create your dataframe
df = pd.DataFrame({
'value 1':[25,25,265, 257, 257, 257],
'value 2':[22,22, 4,1,1,7],
})
# Your groupby
df_new = df.groupby('value 1')['value 2'].apply(list).reset_index(name='value 2')
# Add the duplicate column
df_new['true_false'] = df.groupby('value 1')['value 2'].agg(lambda x : x.duplicated().any()).tolist()
print(df_new)
Output:
value 1 value 2 true_false
0 25 [22, 22] True
1 257 [1, 1, 7] True
2 265 [4] False