Home > Blockchain >  How to count values in Python when two of the columns are greater than 1?
How to count values in Python when two of the columns are greater than 1?

Time:02-03

I have a data frame that looks something like this:

ID Month P1 P2 ... Sales
1 Jan 2 0 ... 2
1 Feb 3 1 ... 0
1 Mar 3 0 ... 0
2 Jan 19 1 ... 20
2 Feb 8 0 ... 0
2 Mar 12 0 ... 1

I want to create another dataframe that will count monthly IDs where Sales and other column (P1,P2, ...Pn) both are greater than 0

For eg. In the above dataframe in Jan there were 2 Ids 1 and 2 which had P1>0 and Sales >0 In Feb there are 0 Ids which had P1>0 & Sales>0 (P1 is greater than 0 for both IDs but Sales is 0) In Mar there is 1 Id 2 which had P1>0 & Sales>0

I want to see result something like this:

Month P1 P2 ...
Jan 2 1 ...
Feb 0 0 ...
March 1 0 ...

I was thinking of creating a dataframe for P1,P2,... Pn using a for loop that counts a number of instances. But this will be a time-consuming process.

Can you please suggest me an alternative solution?

CodePudding user response:

Use a custom aggregation with groupby.agg:

out = (df.filter(like='P').where(df['Sales'].gt(0)).gt(0)
         .groupby(df['Month'], sort=False).sum()
         .reset_index()
       )

Or from an explicit list of columns:

cols = ['P1', 'P2']
out = (df[cols].where(df['Sales'].gt(0)).gt(0)
         .groupby(df['Month'], sort=False).sum()
         .reset_index()
       )

Output:

  Month  P1  P2
0   Jan   2   1
1   Feb   0   0
2   Mar   1   0

CodePudding user response:

def fct(d):
    return d.groupby("ID")[["P1", "P2"]].agg(lambda x: ((x > 0) & (d["Sales"] > 0)).any()).apply(sum)

result = d.groupby("Month").apply(fct)

CodePudding user response:

Another option is to specify/denote the conditions explicitly:

pd.DataFrame({'Month': df.Month, 'P1': df.P1 * df.Sales > 0, 'P2': df.P2 * df.Sales > 0})\
 .groupby('Month', as_index=False, sort=False).sum()

  Month  P1  P2
0  Jan    2   1
1  Feb    0   0
2  Mar    1   0
  • Related