Home > OS >  How to add new columns in Pandas for unique values of certain key (problem agregate)
How to add new columns in Pandas for unique values of certain key (problem agregate)

Time:06-29

How to add a new column of aggregated data

I want to create 03 new columns in a dataframe

Column 01: unique_list

Create a new column in the dataframe of unique values of cfop_code for each key

Column 02: unique_count

A column that check the number of unique values that shows in unique_list

Column 03: not_unique_count

A column that check the number of not unique values that shows in unique_list

example_df

key product cfop_code
0 12345678901234567890 product a 2551
1 12345678901234567890 product b 2551
2 12345678901234567890 product c 3551
3 12345678901234567895 product a 2551
4 12345678901234567897 product b 2551
5 12345678901234567897 product c 2407

Expected Result

key product cfop_code unique_list unique_count not_unique_count
0 12345678901234567890 product a 2551 2251, 3551 2 3
1 12345678901234567890 product b 2551 2251, 3551 2 3
2 12345678901234567890 product c 3551 2251, 3551 2 3
3 12345678901234567895 product a 2551 2251 1 1
4 12345678901234567897 product b 2551 2407, 2551 2 2
5 12345678901234567897 product c 2407 2407, 2551 2 2

What i had tried

Create a list of unique values

df.groupby('key')["cfop"].unique()

key
12345678901234567890    [2551, 3551]
12345678901234567895          [2551]
12345678901234567897    [2551, 2407]
Name: cfop, dtype: object

Getting the count not unique values

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="size")}).reset_index()

key unique_values
0   12345678901234567890    3
1   12345678901234567895    1
2   12345678901234567897    2

Getting the count unique values into data frame

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="nunique")}).reset_index()

key unique_values
0   12345678901234567890    2
1   12345678901234567895    1
2   12345678901234567897    2

But FAIL adding a new column

df['unique_list'] = df.groupby('key')["cfop"].unique()
df['unique_count'] = df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="nunique")}).reset_index()
df['not_unique_count'] =df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="size")}).reset_index()

CodePudding user response:

Try:

tmp = (
    df.groupby("key")["cfop_code"]
    .agg(
        unique_list = lambda s: sorted(s.unique()), 
        unique_count = "nunique", 
        not_unique_count = "size")
    .reset_index()
)
res = df.merge(tmp, on="key")

print(res)
                    key    product  cfop_code   unique_list  unique_count  not_unique_count
0  12345678901234567890  product a       2551  [2551, 3551]             2                 3
1  12345678901234567890  product b       2551  [2551, 3551]             2                 3
2  12345678901234567890  product c       3551  [2551, 3551]             2                 3
3  12345678901234567895  product a       2551        [2551]             1                 1
4  12345678901234567897  product b       2551  [2407, 2551]             2                 2
5  12345678901234567897  product c       2407  [2407, 2551]             2                 2

The problem with your attempt is that:

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop_code', aggfunc="nunique")}).reset_index()

returns a DataFrame.You try to assign this whole DataFrame to a new column which fails.

CodePudding user response:

You can do merge after group and agg like:

df.merge(df.groupby('key',as_index=False).agg(
   unique_list = ('cfop_code', 'unique'),
   unique_count = ('cfop_code', 'nunique'),
   not_unique_count = ('cfop_code', 'size')
), on='key', how = 'left')

output:

                    key    product  cfop_code   unique_list  unique_count  \
0  12345678901234567890  product a       2551  [2551, 3551]             2   
1  12345678901234567890  product b       2551  [2551, 3551]             2   
2  12345678901234567890  product c       3551  [2551, 3551]             2   
3  12345678901234567895  product a       2551        [2551]             1   
4  12345678901234567897  product b       2551  [2551, 2407]             2   
5  12345678901234567897  product c       2407  [2551, 2407]             2   

   not_unique_count  
0                 3  
1                 3  
2                 3  
3                 1  
4                 2  
5                 2  
  • Related