Home > database >  get unique values in a 3rd column by filtering 2 other columns in Pandas
get unique values in a 3rd column by filtering 2 other columns in Pandas

Time:03-26

I have a dataframe i would like to filter. Consider the below input dataframe.

a  b  c  
1  1  1
1  0  1
2  2  1
2  2  2

There are 3 columns ( a, b , and c)

I would like to get the count of unique values for c, for each unique pair of a and b, for a new d column, which has the count of unique values in c for its a, b pair :

a  b  c  d
1  1  1  1
1  0  1  1
1  2  1  2 
1  2  2  2

rows 0, 1 have different a,b column pairs, and so the appended d values for both rows would be 1.

rows 2 and 3 have shared a, b columns and 2 unique values for that pair, their d values would be 2

CodePudding user response:

I think you want to use groupby and nunique

import pandas as pd
data = pd.DataFrame({
    'a':[1,1,2,2],
    'b':[1,0,2,2],
    'c':[1,1,2,3]
})

unique_count = data.groupby(
    ['a','b']
).c.nunique()

data.set_index(['a','b']).assign(
     d = unique_count
).reset_index()

Output:

a   b   c   d
1   1   1   1
1   0   1   1
2   2   2   2
2   2   3   2

CodePudding user response:

Let us try

df['cnt'] = df.groupby(['a','b'])['c'].transform('nunique')
df
Out[303]: 
   a  b  c  cnt
0  1  1  1    1
1  1  0  1    1
2  2  2  1    2
3  2  2  2    2

CodePudding user response:

or you can deduplicate the table for the columns you want to have the unique value and see what the resulting number of deduplicated rows is.

import pandas as pd
df = pd.DataFrame({
    'a':[1,1,2,2],
    'b':[1,0,2,2],
    'c':[1,1,2,3]
})

columns_subset = ['a', 'b']

df.drop_duplicates(subset=columns_subset).shape

This will return the shape as (number of rows, number of columns)

  • Related