Home > other >  dynamically only include dates N days apart
dynamically only include dates N days apart

Time:04-13

I have a dataframe of dates

from datetime import datetime as dtm

df=pd.DataFrame([dtm(2022, 1, 1),dtm(2022,1,2),dtm(2022,1,5),dtm(2022,2,28),dtm(2022,3,1),dtm(2022,5,1)])

    0
0   2022-01-01
1   2022-01-02
2   2022-01-05
3   2022-02-28
4   2022-03-01
5   2022-05-01

I want to start from the first date, and get rid of any date(s) that's within 20 days of the referenced date.

The desired df would be

pd.DataFrame([dtm(2022, 1, 1),dtm(2022,2,28),dtm(2022,5,1)])
0   2022-01-01
1   2022-02-28
2   2022-05-01

I tried writing a loop, but the referenced dataframe would need to update as loop is run. I'm not sure how to deal with that.

#Edit, this this example using 3 days

df=pd.DataFrame([dtm(2022, 1, 1),dtm(2022,1,2),dtm(2022,2,27),dtm(2022,2,28),dtm(2022,3,3),dtm(2022,3,5),dtm(2022,5,1)])

0
0   2022-01-01
1   2022-01-02
2   2022-02-27
3   2022-02-28
4   2022-03-03
5   2022-03-05
6   2022-05-01

The desired output is

pd.DataFrame([dtm(2022, 1, 1),dtm(2022,2,27),dtm(2022,3,3),dtm(2022,5,1)])

0
0   2022-01-01
1   2022-02-27
2   2022-03-03
3   2022-05-01

CodePudding user response:

IIUC, you can use a custom function to sequentially remove the values and update the reference:

def cum_drop(s, delta='3d'):
    if s.empty:
        return s
    ref = s[0]
    delta = pd.Timedelta(delta)
    idx = [s.index[0]]
    for i, d in s.items():
        if d-ref>=delta:
            idx.append(i)
            ref = d
    return s.loc[idx]

df.apply(cum_drop)

output:

           0
0 2022-01-01
2 2022-02-27
4 2022-03-03
6 2022-03-07
7 2022-05-01

used input:

           0
0 2022-01-01
1 2022-01-02
2 2022-02-27
3 2022-02-28
4 2022-03-03
5 2022-03-05
6 2022-03-07
7 2022-05-01
  • Related