Suppose I have the following dataframe:
fid prefix target_text
0 f1 p1 t1
1 f1 p1 t2
2 f1 p2 t1
3 f1 p2 t2
4 f1 p3 t1
5 f1 p3 t3
6 f1 p3 t4
7 f2 p1 t1
8 f2 p1 t2
9 f2 p2 t2
10 f2 p2 t1
If I group them by fid
and prefix
and count unique target_text
I have:
>>> num_targets = df.groupby(['fid','prefix'])['target_text'].transform('nunique')
0 2
1 2
2 2
3 2
4 3
5 3
6 3
7 2
8 2
9 2
10 2
Now I want to group them by only 'fid' but in front of it print the number of distinct [prefix
, target_text
]
I expect:
num_targets
f1 7
f2 4
However if I gourp the dataframe by fid
, then how can I count distinct [prefix, target_text]
?
CodePudding user response:
If need unique per both columns output is different:
s = (df['target_text'] '_' df['prefix']).groupby(df['fid']).nunique()
print (s)
fid
f1 7
f2 4
dtype: int64
s = df.drop_duplicates(['fid','prefix','target_text'])['fid'].value_counts()
print (s)
f1 7
f2 4
Name: fid, dtype: int64