I am trying to detect if a value is repeated for two months in a row, but I am not able to find a way without iterating through all the rows. The problem with iterating is that it makes it impossibly slow.
My Table:
date country cur amount
0 2019-01-07 LTU EUR 150.00
1 2019-01-08 LTU EUR 50.00
2 2019-01-02 LTU EUR 561.60
3 2019-01-02 LTU EUR 990.14
4 2019-01-02 LTU EUR 700.00
5 2019-02-07 LTU EUR 150.00
What I am trying to achieve:
date country cur amount monthly
0 2019-01-07 LTU EUR 150.00 True
1 2019-01-08 LTU EUR 50.00 False
2 2019-01-02 LTU EUR 561.60 False
3 2019-01-02 LTU EUR 990.14 False
4 2019-01-02 LTU EUR 700.00 False
5 2019-02-07 LTU EUR 150.00 True
The value 150.00 in the EUR currency is repeated on the same day of the next month.
Thanks!
CodePudding user response:
You can try groupby
with transform
nunique
df['new'] = df.date.dt.strftime('%Y-%m').groupby(df['amount']).transform('nunique')>1
Out[512]:
0 True
1 False
2 False
3 False
4 False
5 True
Name: date, dtype: bool
CodePudding user response:
One option is to pull the month values, sort the columns, groupby, do a transform on the month column and get the boolean:
result = (df
.assign(month = df.date.dt.month)
.sort_values(['country', 'amount', 'month'])
.groupby(['country', 'amount'])
.month
.transform(lambda x: x.iloc[-1] - x.iloc[0])
.eq(1)
)
df.assign(monthly = result)
date country cur amount monthly
0 2019-01-07 LTU EUR 150.00 True
1 2019-01-08 LTU EUR 50.00 False
2 2019-01-02 LTU EUR 561.60 False
3 2019-01-02 LTU EUR 990.14 False
4 2019-01-02 LTU EUR 700.00 False
5 2019-02-07 LTU EUR 150.00 True
This works with the assumption that there is only one entry per month.