I'm currently developing something and was wondering if the new match statement in python 3.10 would be suited for such a use case, where I have conditional statements.
As input I have a timestamp and a dataframe with dates and values. The goal is to loop over all rows and add the value to the corresponding bin bases on the date. Here, in which bin the value is placed depends on the date in relation with the timestamp. A date within 1 month of the timestamp is place in bin 1 and within 2 months in bin 2 etc...
The code that I have now is as follows:
bins = [0] * 7
for date, value in zip(df.iloc[:,0],df.iloc[:,1]):
match [date,value]:
case [date,value] if date < timestamp pd.Timedelta(1,'m'):
bins[0] = value
case [date,value] if date > timestamp pd.Timedelta(1,'m') and date < timestamp pd.Timedelta(2,'m'):
bins[1] = value
case [date,value] if date > timestamp pd.Timedelta(2,'m') and date < timestamp pd.Timedelta(3,'m'):
bins[2] = value
case [date,value] if date > timestamp pd.Timedelta(3,'m') and date < timestamp pd.Timedelta(4,'m'):
bins[3] = value
case [date,value] if date > timestamp pd.Timedelta(4,'m') and date < timestamp pd.Timedelta(5,'m'):
bins[4] = value
case [date,value] if date > timestamp pd.Timedelta(5,'m') and date < timestamp pd.Timedelta(6,'m'):
bins[5] = value
Correction: originally I stated that this code does not work. It turns out that it actually does. However, I am still wondering if this would be an appropriate use of the match statement.
CodePudding user response:
I'd say it's not a good use of structural pattern matching because there is no actual structure. You are checking values of the single object, so if/elif chain is a much better, more readable and natural choice.
I've got 2 more issues with the way you wrote it -
- you do not consider values that are on the edges of the bins
- You are checking same condition twice, even though if you reached some check in match/case you are guaranteed that the previous ones were not matched - so you do not need to do
if date > timestamp pd.Timedelta(1,'m') and...
if previous check ofif date < timestamp pd.Timedelta(1,'m')
failed you already know that it is not smaller. (There is an edge case of equality but it should be handled somehow anyway)
All in all I think this would be the cleaner solution:
for date, value in zip(df.iloc[:,0],df.iloc[:,1]):
if date < timestamp pd.Timedelta(1,'m'):
bins[0] = value
elif date < timestamp pd.Timedelta(2,'m'):
bins[1] = value
elif date < timestamp pd.Timedelta(3,'m'):
bins[2] = value
elif date < timestamp pd.Timedelta(4,'m'):
bins[3] = value
elif date < timestamp pd.Timedelta(5,'m'):
bins[4] = value
elif date < timestamp pd.Timedelta(6,'m'):
bins[5] = value
else:
pass
CodePudding user response:
This should really be done directly with Pandas functions:
import pandas as pd
from datetime import datetime
timestamp = datetime.now()
bins = [pd.Timestamp(year=1970, month=1, day=1)] [pd.Timestamp(timestamp) pd.Timedelta(i, 'm') for i in range(6)] [pd.Timestamp(year=2100, month=1, day=1)] # plus open bin on the right
n_samples = 1000
data = {
'date': [pd.to_datetime(timestamp) pd.Timedelta(i,'s') for i in range(n_samples)],
'value': list(range(n_samples))
}
df = pd.DataFrame(data)
df['bin'] = pd.cut(df.date, bins, right=False)
df.groupby('bin').value.sum()