I have a data frame that looks like this:
Vendor GRDate Pass/Fail
0 204177 2022-22 1.0
1 204177 2022-22 0.0
2 204177 2022-22 0.0
3 204177 2022-22 1.0
4 204177 2022-22 1.0
5 204177 2022-22 1.0
7 201645 2022-22 0.0
8 201645 2022-22 0.0
9 201645 2022-22 1.0
10 201645 2022-22 1.0
I am trying to work out the percentage of where Pass/Fail equals 1 for each week for each vendor and put it in a new df (Number of pass = 1 / total number of lines per vendor & week)
which would look like this:
Vendor GRDate Performance
0 204177 2022-22 0.6
1 201645 2022-22 0.5
I'm trying to do this with .groupby()
and .count()
but i can't work out how to get this into a new df along with the Vendor and GRDate columns. The code I have here returns the percentage of pass fail but drops the other two columns.
sdp_percent = sdp.groupby(['GRDate','Vendor'])['Pass/Fail'].apply(lambda x: x[x == 1].count()) / sdp.groupby(['GRDate','Vendor'])['Pass/Fail'].count()
But then if I add .reset_index()
to keep them I get this error: unsupported operand type(s) for /: 'str' and 'str'
Please can someone explain what i'm doing wrong?
CodePudding user response:
As you have 0/1, you can use a groupby.mean
:
(df.groupby(['Vendor', 'GRDate'], as_index=False, sort=False)
.agg(Performance=('Pass/Fail', 'mean'))
)
If you had a specific arbitrary value X
:
(df.assign(val=df['Pass/Fail'].eq(X))
.groupby(['Vendor', 'GRDate'], as_index=False, sort=False)
.agg(Performance=('val', 'mean'))
)
Output:
Vendor GRDate Performance
0 204177 2022-22 0.666667
1 201645 2022-22 0.500000
CodePudding user response:
Try:
x = (
df.groupby(["GRDate", "Vendor"])["Pass/Fail"]
.mean()
.reset_index()
.rename(columns={"Pass/Fail": "Performance"})
)
print(x)
Prints:
GRDate Vendor Performance
0 2022-22 201645 0.500000
1 2022-22 204177 0.666667