Home > OS >  How to use the new column name created on the same line of code in Pandas
How to use the new column name created on the same line of code in Pandas

Time:04-17

I have a df

and now I do this:

df.groupby(['StudentID', 'Major']).size().reset_index(name='Freq')

In the code above, I created a new column Freq which calculates the frequency of a combination of StudentID and Major

But now I want to get the data that has Freq of 1 or greater on the same line.

Eg:

df.groupby(['StudentID', 'Major']).size().reset_index(name='Freq')[df['Freq'] > 1]

which does not work since the original df does not have Freq column.

One possible way is to save the filtered value into a new DataFrame, lets say, df2 and then filter using df2[df2['Freq'] > 1] but I want to know if there is a way to use it in one line of code.

CodePudding user response:

You can use pipe:

out = (df.groupby(['StudentID', 'Major'])
         .size()
         .reset_index(name='Freq')
         .pipe(lambda x: x[x['Freq']>1]))

CodePudding user response:

You can do it this way using the walrus operator :=.

import pandas as pd
df = pd.DataFrame({
    'StudentID' : [1,1,1,2,2,2,2,3,3],
    'Major' : 'math,english,english,math,physics,physics,physics,math,classics'.split(',')
})
print(df)
x = (df := df.groupby(['StudentID', 'Major']).size().reset_index(name='Freq'))[df['Freq'] > 1]
print(x)

Output:

   StudentID     Major
0          1      math
1          1   english
2          1   english
3          2      math
4          2   physics
5          2   physics
6          2   physics
7          3      math
8          3  classics
   StudentID    Major  Freq
0          1  english     2
3          2  physics     3
  • Related