I have a dataframe i would like to filter. Consider the below input dataframe.
a b c
1 1 1
1 0 1
2 2 1
2 2 2
There are 3 columns ( a, b , and c)
I would like to get the count of unique values for c, for each unique pair of a and b, for a new d column, which has the count of unique values in c for its a, b pair :
a b c d
1 1 1 1
1 0 1 1
1 2 1 2
1 2 2 2
rows 0, 1 have different a,b column pairs, and so the appended d values for both rows would be 1.
rows 2 and 3 have shared a, b columns and 2 unique values for that pair, their d values would be 2
CodePudding user response:
I think you want to use groupby and nunique
import pandas as pd
data = pd.DataFrame({
'a':[1,1,2,2],
'b':[1,0,2,2],
'c':[1,1,2,3]
})
unique_count = data.groupby(
['a','b']
).c.nunique()
data.set_index(['a','b']).assign(
d = unique_count
).reset_index()
Output:
a b c d
1 1 1 1
1 0 1 1
2 2 2 2
2 2 3 2
CodePudding user response:
Let us try
df['cnt'] = df.groupby(['a','b'])['c'].transform('nunique')
df
Out[303]:
a b c cnt
0 1 1 1 1
1 1 0 1 1
2 2 2 1 2
3 2 2 2 2
CodePudding user response:
or you can deduplicate the table for the columns you want to have the unique value and see what the resulting number of deduplicated rows is.
import pandas as pd
df = pd.DataFrame({
'a':[1,1,2,2],
'b':[1,0,2,2],
'c':[1,1,2,3]
})
columns_subset = ['a', 'b']
df.drop_duplicates(subset=columns_subset).shape
This will return the shape as (number of rows, number of columns)