Home > Blockchain >  pandas: Aggregate on one column and count based on two columns
pandas: Aggregate on one column and count based on two columns

Time:05-17

Suppose I have the following dataframe:

   fid prefix target_text
0   f1     p1          t1
1   f1     p1          t2
2   f1     p2          t1
3   f1     p2          t2
4   f1     p3          t1
5   f1     p3          t3
6   f1     p3          t4
7   f2     p1          t1
8   f2     p1          t2
9   f2     p2          t2
10  f2     p2          t1

If I group them by fid and prefix and count unique target_text I have:

>>> num_targets = df.groupby(['fid','prefix'])['target_text'].transform('nunique')

0     2
1     2
2     2
3     2
4     3
5     3
6     3
7     2
8     2
9     2
10    2

Now I want to group them by only 'fid' but in front of it print the number of distinct [prefix, target_text]

I expect:

      num_targets           
f1    7
f2    4

However if I gourp the dataframe by fid, then how can I count distinct [prefix, target_text]?

CodePudding user response:

If need unique per both columns output is different:

s = (df['target_text']    '_'   df['prefix']).groupby(df['fid']).nunique()
print (s)
fid
f1    7
f2    4
dtype: int64

s = df.drop_duplicates(['fid','prefix','target_text'])['fid'].value_counts()
print (s)
f1    7
f2    4
Name: fid, dtype: int64
  • Related