Home > Blockchain >  Elementwise "between" of one pandas series containing the value and two series containing
Elementwise "between" of one pandas series containing the value and two series containing

Time:04-05

I have three Series

trans = pd.Series([2020-01-01, 2022-03-15 ...]) #transaction
start = pd.Series([2020-02-03, 2021-05-13]) #valid dates start
end =pd.Series([2020-02-28,2021-06-13]) #valid dates end

and I want to see if a transaction is "valid" which it is, if it was made in a date-range defined by start and end i.e start[i] and end[i] composes one date-range of which the transaction could be valid, e.g for the first transaction, I want to check if 2020-01-01 is between (2020-02-03,2020-02-28) or (2021-05-13,2021-06-13).

I have tried using between e.g trans.between(start,end) but since trans is not the same size, it cannot be done in that way.

I have written a list comprehension but it loops through all start,end pairs for each element in trans, which is not optimal, since it should terminate when it just finds one pair.

I can remove some of the transactions by checking, if the date is above min(start) and below max(end), but it still takes a while by my own solution

CodePudding user response:

You can use numpy broadcasting for test all values between all values of both Series and test if at least one value match in any:

ser = trans.to_numpy()[:, None]
left = start.to_numpy()
right = end.to_numpy()
m = ((left <= ser) & (ser <= right)).any(axis=1)

out = trans[m]
print (out)

CodePudding user response:

You could do it like this. Hope it helps:

import pandas as pd
import datetime


trans = pd.Series(["2020-01-01", "2022-03-15", "2021-05-20", "2022-02-10", "2009-09-09"])
start = pd.Series(["2020-02-03", "2021-05-13", "2020-05-06"])
end =pd.Series(["2020-02-28","2021-06-13", "2020-05-08"])
                  
start = list(map(datetime.date.fromisoformat, start))
end = list(map(datetime.date.fromisoformat, end))
trans = list(map(datetime.date.fromisoformat, trans))

for elem in trans:
    print(elem)
    for lft,rgt in zip(start,end):
        if lft <= elem <= rgt:
            print(f'{elem} is between {lft} & {rgt}')
            break

output:

2020-01-01
2022-03-15
2021-05-20
2021-05-20 is between 2021-05-13 & 2021-06-13
2022-02-10
2009-09-09

it checks all elements but stops looking in start/end when finding a match

  • Related