I have the following dataframe and want to create two columns, one will show the amount MTD and the other will show the cumulative YTD based on a date parameter for each Account Name. This is easily achievable in Excel using a =SUMIFS formula and want to know the Python equivalent.
--------------- ------------ ------------ ------------ ------------
| Account Names | 31/01/2022 | 28/02/2022 | 31/03/2022 | 30/04/2022 |
--------------- ------------ ------------ ------------ ------------
| Cash At Bank | 100 | 150 | 100 | 150 |
| Debtors | 50 | 50 | 50 | 100 |
| Inventory | 250 | 250 | 350 | 100 |
| PAYG Withheld | 50 | 50 | 10 | 150 |
--------------- ------------ ------------ ------------ ------------
Ideally, I'd want this to be as efficient as possible i.e. doesn't require loops. I went the route of trying to do this using np.select as I've read this is one of the fastest methods, but had no luck. I get the following error:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
EndDate = '31/03/2022'
Budget_Assets["MTD_Amount"] = np.select(condlist=[Budget_Assets.columns == EndDate],choicelist=[Budget_Assets[EndDate]],default=0)
For example, the value in the MTD_Amount column for Cash At Bank should be 100 and the YTD_Column will be 350 (sum of numbers from '31/01/2022' to '31/03/2022')
CodePudding user response:
You can try sum(axis=1)
by slicing the datetime like columns to calculate YTD
and just use loc
to get MTD
EndDate = '31/03/2022'
date_cols = df.filter(regex='\d{2}/\d{2}/\d{4}')
date_cols.columns = pd.to_datetime(date_cols.columns, dayfirst=True)
df['YTD_Column'] = date_cols.loc[:, :pd.to_datetime(EndDate, dayfirst=True)].sum(axis=1)
df['MTD_Column'] = df[EndDate]
Account Names 31/01/2022 28/02/2022 31/03/2022 30/04/2022 YTD_Column MTD_Column
0 Cash At Bank 100 150 100 150 350 100
1 Debtors 50 50 50 100 150 50
2 Inventory 250 250 350 100 850 350
3 PAYG Withheld 50 50 10 150 110 10