Conditional cases in match statement python3.10 (structural pattern matching)-CodePudding

I'm currently developing something and was wondering if the new match statement in python 3.10 would be suited for such a use case, where I have conditional statements.

As input I have a timestamp and a dataframe with dates and values. The goal is to loop over all rows and add the value to the corresponding bin bases on the date. Here, in which bin the value is placed depends on the date in relation with the timestamp. A date within 1 month of the timestamp is place in bin 1 and within 2 months in bin 2 etc...

The code that I have now is as follows:

bins = [0] * 7

for date, value in zip(df.iloc[:,0],df.iloc[:,1]):
    match [date,value]:
        case [date,value] if date < timestamp   pd.Timedelta(1,'m'):
            bins[0]  = value
        case [date,value] if date > timestamp   pd.Timedelta(1,'m') and date < timestamp   pd.Timedelta(2,'m'):
            bins[1]  = value
        case [date,value] if date > timestamp   pd.Timedelta(2,'m') and date < timestamp   pd.Timedelta(3,'m'):
            bins[2]  = value
        case [date,value] if date > timestamp   pd.Timedelta(3,'m') and date < timestamp   pd.Timedelta(4,'m'):
            bins[3]  = value
        case [date,value] if date > timestamp   pd.Timedelta(4,'m') and date < timestamp   pd.Timedelta(5,'m'):
            bins[4]  = value
        case [date,value] if date > timestamp   pd.Timedelta(5,'m') and date < timestamp   pd.Timedelta(6,'m'):
            bins[5]  = value

Correction: originally I stated that this code does not work. It turns out that it actually does. However, I am still wondering if this would be an appropriate use of the match statement.

CodePudding user response：

I'd say it's not a good use of structural pattern matching because there is no actual structure. You are checking values of the single object, so if/elif chain is a much better, more readable and natural choice.

I've got 2 more issues with the way you wrote it -

you do not consider values that are on the edges of the bins
You are checking same condition twice, even though if you reached some check in match/case you are guaranteed that the previous ones were not matched - so you do not need to do if date > timestamp pd.Timedelta(1,'m') and... if previous check of if date < timestamp pd.Timedelta(1,'m') failed you already know that it is not smaller. (There is an edge case of equality but it should be handled somehow anyway)

All in all I think this would be the cleaner solution:

for date, value in zip(df.iloc[:,0],df.iloc[:,1]):

    if date < timestamp   pd.Timedelta(1,'m'):
        bins[0]  = value
    elif date < timestamp   pd.Timedelta(2,'m'):
        bins[1]  = value
    elif date < timestamp   pd.Timedelta(3,'m'):
        bins[2]  = value
    elif date < timestamp   pd.Timedelta(4,'m'):
        bins[3]  = value
    elif date < timestamp   pd.Timedelta(5,'m'):
        bins[4]  = value
    elif date < timestamp   pd.Timedelta(6,'m'):
        bins[5]  = value
    else:
        pass

CodePudding user response：

This should really be done directly with Pandas functions:

import pandas as pd
from datetime import datetime

timestamp = datetime.now()
bins = [pd.Timestamp(year=1970, month=1, day=1)] [pd.Timestamp(timestamp) pd.Timedelta(i, 'm') for i in range(6)] [pd.Timestamp(year=2100, month=1, day=1)] # plus open bin on the right
n_samples = 1000

data = {
  'date': [pd.to_datetime(timestamp) pd.Timedelta(i,'s') for i in range(n_samples)],
  'value': list(range(n_samples))
}

df = pd.DataFrame(data)

df['bin'] = pd.cut(df.date, bins, right=False)
df.groupby('bin').value.sum()