Home > database >  How to add indicator column that lists all values of another column in pandas
How to add indicator column that lists all values of another column in pandas

Time:02-01

I have the following pandas dataframe:

import pandas as pd
pd.DataFrame({'id': [1,1,1,1,2,2,2], 'col': ['a','b','c','c','a','b','d']})

id  col
0   1   a
1   1   b
2   1   c
3   1   c
4   2   a
5   2   b
6   2   d

I would like to add a new column, which would contain the list of unique values of col by id

The end dataframe would look like this:

pd.DataFrame({'id': [1,1,1,1,2,2,2], 'col': ['a','b','c','c','a','b','d'],
             'col2': [['a','b','c'],['a','b','c'],['a','b','c'],['a','b','c'],
                     ['a','b','d'],['a','b','d'],['a','b','d']]})

id  col col2
0   1   a   [a, b, c]
1   1   b   [a, b, c]
2   1   c   [a, b, c]
3   1   c   [a, b, c]
4   2   a   [a, b, d]
5   2   b   [a, b, d]
6   2   d   [a, b, d]

How could I do that ?

CodePudding user response:

A possible solution with GroupBy.agg and reindex :

df["col2"] = (df.groupby("id")["col"].agg(lambda x: x.unique().tolist())
                  .reindex(df["id"]).values)

Output :

print(df)

   id col       col2
0   1   a  [a, b, c]
1   1   b  [a, b, c]
2   1   c  [a, b, c]
3   1   c  [a, b, c]
4   2   a  [a, b, d]
5   2   b  [a, b, d]
6   2   d  [a, b, d]

CodePudding user response:

For lists remove duplicated values in order is used dict.fromkeys trick in lambda function and for new column is used Series.map:

df['col2'] = df['id'].map(df.groupby('id')['col'].agg(lambda x: list(dict.fromkeys(x))))

Or remove duplicates first by DataFrame.drop_duplicates and aggregate list:

df['col2'] = df['id'].map(df.drop_duplicates(['id','col']).groupby('id')['col'].agg(list))

If possible use ndarrays instead lists use DataFrameGroupBy.nunique:

df['col2'] = df['id'].map(df.groupby('id')['col'].unique())
print (df)
   id col       col2
0   1   a  [a, b, c]
1   1   b  [a, b, c]
2   1   c  [a, b, c]
3   1   c  [a, b, c]
4   2   a  [a, b, d]
5   2   b  [a, b, d]
6   2   d  [a, b, d]
  • Related