Home > database >  Pandas percentage of two columns
Pandas percentage of two columns

Time:11-17

I have a data frame that looks like this:

    Vendor  GRDate  Pass/Fail
0   204177  2022-22 1.0
1   204177  2022-22 0.0
2   204177  2022-22 0.0
3   204177  2022-22 1.0
4   204177  2022-22 1.0
5   204177  2022-22 1.0
7   201645  2022-22 0.0
8   201645  2022-22 0.0
9   201645  2022-22 1.0
10  201645  2022-22 1.0

I am trying to work out the percentage of where Pass/Fail equals 1 for each week for each vendor and put it in a new df (Number of pass = 1 / total number of lines per vendor & week)

which would look like this:

    Vendor  GRDate  Performance
0   204177  2022-22 0.6
1   201645  2022-22 0.5

I'm trying to do this with .groupby() and .count() but i can't work out how to get this into a new df along with the Vendor and GRDate columns. The code I have here returns the percentage of pass fail but drops the other two columns.

sdp_percent = sdp.groupby(['GRDate','Vendor'])['Pass/Fail'].apply(lambda x: x[x == 1].count()) / sdp.groupby(['GRDate','Vendor'])['Pass/Fail'].count()

But then if I add .reset_index() to keep them I get this error: unsupported operand type(s) for /: 'str' and 'str'

Please can someone explain what i'm doing wrong?

CodePudding user response:

As you have 0/1, you can use a groupby.mean:

(df.groupby(['Vendor', 'GRDate'], as_index=False, sort=False)
   .agg(Performance=('Pass/Fail', 'mean'))
)

If you had a specific arbitrary value X:

(df.assign(val=df['Pass/Fail'].eq(X))
   .groupby(['Vendor', 'GRDate'], as_index=False, sort=False)
   .agg(Performance=('val', 'mean'))
)

Output:

   Vendor   GRDate  Performance
0  204177  2022-22     0.666667
1  201645  2022-22     0.500000

CodePudding user response:

Try:

x = (
    df.groupby(["GRDate", "Vendor"])["Pass/Fail"]
    .mean()
    .reset_index()
    .rename(columns={"Pass/Fail": "Performance"})
)
print(x)

Prints:

    GRDate  Vendor  Performance
0  2022-22  201645     0.500000
1  2022-22  204177     0.666667
  • Related