get unique values in a 3rd column by filtering 2 other columns in Pandas-CodePudding

I have a dataframe i would like to filter. Consider the below input dataframe.

There are 3 columns ( a, b , and c)

I would like to get the count of unique values for c, for each unique pair of a and b, for a new d column, which has the count of unique values in c for its a, b pair :

rows 0, 1 have different a,b column pairs, and so the appended d values for both rows would be 1.

rows 2 and 3 have shared a, b columns and 2 unique values for that pair, their d values would be 2

CodePudding user response：

I think you want to use groupby and nunique

import pandas as pd
data = pd.DataFrame({
    'a':[1,1,2,2],
    'b':[1,0,2,2],
    'c':[1,1,2,3]
})

unique_count = data.groupby(
    ['a','b']
).c.nunique()

data.set_index(['a','b']).assign(
     d = unique_count
).reset_index()

Output:

a   b   c   d
1   1   1   1
1   0   1   1
2   2   2   2
2   2   3   2

CodePudding user response：

Let us try

df['cnt'] = df.groupby(['a','b'])['c'].transform('nunique')
df
Out[303]: 
   a  b  c  cnt
0  1  1  1    1
1  1  0  1    1
2  2  2  1    2
3  2  2  2    2

CodePudding user response：

or you can deduplicate the table for the columns you want to have the unique value and see what the resulting number of deduplicated rows is.

import pandas as pd
df = pd.DataFrame({
    'a':[1,1,2,2],
    'b':[1,0,2,2],
    'c':[1,1,2,3]
})

columns_subset = ['a', 'b']

df.drop_duplicates(subset=columns_subset).shape

This will return the shape as (number of rows, number of columns)