Detect consecutive monthly repetition in Pandas DataFrame column without iterating-CodePudding

I am trying to detect if a value is repeated for two months in a row, but I am not able to find a way without iterating through all the rows. The problem with iterating is that it makes it impossibly slow.

My Table:

          date country    cur    amount
0   2019-01-07     LTU    EUR    150.00
1   2019-01-08     LTU    EUR     50.00
2   2019-01-02     LTU    EUR    561.60
3   2019-01-02     LTU    EUR    990.14
4   2019-01-02     LTU    EUR    700.00
5   2019-02-07     LTU    EUR    150.00

What I am trying to achieve:

          date country    cur    amount  monthly
0   2019-01-07     LTU    EUR    150.00     True
1   2019-01-08     LTU    EUR     50.00    False
2   2019-01-02     LTU    EUR    561.60    False
3   2019-01-02     LTU    EUR    990.14    False
4   2019-01-02     LTU    EUR    700.00    False
5   2019-02-07     LTU    EUR    150.00     True

The value 150.00 in the EUR currency is repeated on the same day of the next month.

Thanks!

CodePudding user response：

You can try groupby with transform nunique

df['new'] = df.date.dt.strftime('%Y-%m').groupby(df['amount']).transform('nunique')>1
Out[512]: 
0     True
1    False
2    False
3    False
4    False
5     True
Name: date, dtype: bool

CodePudding user response：

One option is to pull the month values, sort the columns, groupby, do a transform on the month column and get the boolean:

result = (df
          .assign(month = df.date.dt.month)
          .sort_values(['country', 'amount', 'month'])
          .groupby(['country', 'amount'])
          .month
          .transform(lambda x: x.iloc[-1] - x.iloc[0])
          .eq(1)
)

df.assign(monthly = result)

        date country  cur  amount  monthly
0 2019-01-07     LTU  EUR  150.00     True
1 2019-01-08     LTU  EUR   50.00    False
2 2019-01-02     LTU  EUR  561.60    False
3 2019-01-02     LTU  EUR  990.14    False
4 2019-01-02     LTU  EUR  700.00    False
5 2019-02-07     LTU  EUR  150.00     True

This works with the assumption that there is only one entry per month.