I have a data frame that looks something like this:
ID | Month | P1 | P2 | ... | Sales |
---|---|---|---|---|---|
1 | Jan | 2 | 0 | ... | 2 |
1 | Feb | 3 | 1 | ... | 0 |
1 | Mar | 3 | 0 | ... | 0 |
2 | Jan | 19 | 1 | ... | 20 |
2 | Feb | 8 | 0 | ... | 0 |
2 | Mar | 12 | 0 | ... | 1 |
I want to create another dataframe that will count monthly IDs where Sales and other column (P1,P2, ...Pn) both are greater than 0
For eg. In the above dataframe in Jan there were 2 Ids 1 and 2 which had P1>0 and Sales >0 In Feb there are 0 Ids which had P1>0 & Sales>0 (P1 is greater than 0 for both IDs but Sales is 0) In Mar there is 1 Id 2 which had P1>0 & Sales>0
I want to see result something like this:
Month | P1 | P2 | ... |
---|---|---|---|
Jan | 2 | 1 | ... |
Feb | 0 | 0 | ... |
March | 1 | 0 | ... |
I was thinking of creating a dataframe for P1,P2,... Pn using a for loop that counts a number of instances. But this will be a time-consuming process.
Can you please suggest me an alternative solution?
CodePudding user response:
Use a custom aggregation with groupby.agg
:
out = (df.filter(like='P').where(df['Sales'].gt(0)).gt(0)
.groupby(df['Month'], sort=False).sum()
.reset_index()
)
Or from an explicit list of columns:
cols = ['P1', 'P2']
out = (df[cols].where(df['Sales'].gt(0)).gt(0)
.groupby(df['Month'], sort=False).sum()
.reset_index()
)
Output:
Month P1 P2
0 Jan 2 1
1 Feb 0 0
2 Mar 1 0
CodePudding user response:
def fct(d):
return d.groupby("ID")[["P1", "P2"]].agg(lambda x: ((x > 0) & (d["Sales"] > 0)).any()).apply(sum)
result = d.groupby("Month").apply(fct)
CodePudding user response:
Another option is to specify/denote the conditions explicitly:
pd.DataFrame({'Month': df.Month, 'P1': df.P1 * df.Sales > 0, 'P2': df.P2 * df.Sales > 0})\
.groupby('Month', as_index=False, sort=False).sum()
Month P1 P2
0 Jan 2 1
1 Feb 0 0
2 Mar 1 0