How do I iterate through a column with lists in each cell to find errors?-CodePudding

I have a column with lists in each row, I would like to check if any of the rows has a duplicate.

updated_df.groupby('value 1')['value 2'].apply(list).reset_index(name='value 2')

| Vaue 1| Vaue 2|
|:------|:------|
|25     |[22,22]|
|265    |  [4]  | 
|257    |[1,1,7]|

My intention is to create an adjacent column which contains 'True' or some other indicator to see if there are duplicates and no if not.

Thanks!

CodePudding user response：

Assuming that you created the dataframe with the statement you posted. You can do a similar thing to check if a duplicate is in the group and save it as a column:

# Create your dataframe
df = pd.DataFrame({
    'value 1':[25,25,265, 257, 257, 257],
    'value 2':[22,22, 4,1,1,7],
})

# Your groupby
df_new = df.groupby('value 1')['value 2'].apply(list).reset_index(name='value 2')

# Add the duplicate column
df_new['true_false'] = df.groupby('value 1')['value 2'].agg(lambda x : x.duplicated().any()).tolist()

print(df_new)

Output:

   value 1    value 2  true_false
0       25   [22, 22]        True
1      257  [1, 1, 7]        True
2      265        [4]       False