I have a df
and now I do this:
df.groupby(['StudentID', 'Major']).size().reset_index(name='Freq')
In the code above, I created a new column Freq
which calculates the frequency of a combination of StudentID
and Major
But now I want to get the data that has Freq
of 1 or greater
on the same line.
Eg:
df.groupby(['StudentID', 'Major']).size().reset_index(name='Freq')[df['Freq'] > 1]
which does not work since the original df
does not have Freq
column.
One possible way is to save the filtered value into a new DataFrame, lets say, df2
and then filter using df2[df2['Freq'] > 1]
but I want to know if there is a way to use it in one line of code.
CodePudding user response:
You can use pipe
:
out = (df.groupby(['StudentID', 'Major'])
.size()
.reset_index(name='Freq')
.pipe(lambda x: x[x['Freq']>1]))
CodePudding user response:
You can do it this way using the walrus operator :=
.
import pandas as pd
df = pd.DataFrame({
'StudentID' : [1,1,1,2,2,2,2,3,3],
'Major' : 'math,english,english,math,physics,physics,physics,math,classics'.split(',')
})
print(df)
x = (df := df.groupby(['StudentID', 'Major']).size().reset_index(name='Freq'))[df['Freq'] > 1]
print(x)
Output:
StudentID Major
0 1 math
1 1 english
2 1 english
3 2 math
4 2 physics
5 2 physics
6 2 physics
7 3 math
8 3 classics
StudentID Major Freq
0 1 english 2
3 2 physics 3