Home > Back-end >  Detect consecutive monthly repetition in Pandas DataFrame column without iterating
Detect consecutive monthly repetition in Pandas DataFrame column without iterating

Time:05-11

I am trying to detect if a value is repeated for two months in a row, but I am not able to find a way without iterating through all the rows. The problem with iterating is that it makes it impossibly slow.

My Table:

          date country    cur    amount
0   2019-01-07     LTU    EUR    150.00
1   2019-01-08     LTU    EUR     50.00
2   2019-01-02     LTU    EUR    561.60
3   2019-01-02     LTU    EUR    990.14
4   2019-01-02     LTU    EUR    700.00
5   2019-02-07     LTU    EUR    150.00

What I am trying to achieve:

          date country    cur    amount  monthly
0   2019-01-07     LTU    EUR    150.00     True
1   2019-01-08     LTU    EUR     50.00    False
2   2019-01-02     LTU    EUR    561.60    False
3   2019-01-02     LTU    EUR    990.14    False
4   2019-01-02     LTU    EUR    700.00    False
5   2019-02-07     LTU    EUR    150.00     True

The value 150.00 in the EUR currency is repeated on the same day of the next month.

Thanks!

CodePudding user response:

You can try groupby with transform nunique

df['new'] = df.date.dt.strftime('%Y-%m').groupby(df['amount']).transform('nunique')>1
Out[512]: 
0     True
1    False
2    False
3    False
4    False
5     True
Name: date, dtype: bool

CodePudding user response:

One option is to pull the month values, sort the columns, groupby, do a transform on the month column and get the boolean:

result = (df
          .assign(month = df.date.dt.month)
          .sort_values(['country', 'amount', 'month'])
          .groupby(['country', 'amount'])
          .month
          .transform(lambda x: x.iloc[-1] - x.iloc[0])
          .eq(1)
)

df.assign(monthly = result)

        date country  cur  amount  monthly
0 2019-01-07     LTU  EUR  150.00     True
1 2019-01-08     LTU  EUR   50.00    False
2 2019-01-02     LTU  EUR  561.60    False
3 2019-01-02     LTU  EUR  990.14    False
4 2019-01-02     LTU  EUR  700.00    False
5 2019-02-07     LTU  EUR  150.00     True

This works with the assumption that there is only one entry per month.

  • Related