what is the most efficient approach to this problem? I have a df with users (users are not unique), a month column and a profit/loss column. I want to create a new binary column 'eligibility' where it is always 1 unless the following scenario: if a user shows a loss, they need to cover that amount in the next month(s) in order to be eligible, so for the first negative month they are 0 until they cover the sum
for instance using a dataset like this:
user,month,profit/loss
1,2022-03-01,400
2,2022-03-01,-100
3,2022-03-01,5000
1,2022-04-01,-10
2,2022-04-01,80
3,2022-04-01,400
1,2022-05-01,60
2,2022-05-01,60
new column would be
1
0
1
0
0
1
1
1
CodePudding user response:
Maybe you could loop through the dataset, and when a loss is found, map that loss to the corresponding user in any desired manner (for example a dictionary or a hashmap) and add a 0 to the column.
When a profit is found, check if that user is currently at a loss, if so, add the profit to the loss, and if the result of the addition is greater than 0, the record can be deleted and a 1 can be added to the column. If the user with the profit is not found to be currently a loss, just move on and add a 1 to the column.
I hope this helps.
CodePudding user response:
The logic is not fully clear, so assuming you want 1 when a user has an overall positive (or null) profit, and 0 if negative, you can use groupby.cumsum
and compare to 0
:
df['out'] = df.groupby('user')['profit/loss'].cumsum().gt(0).astype(int)
output:
user month profit/loss out
0 1 2022-03-01 400 1
1 2 2022-03-01 -100 0
2 3 2022-03-01 5000 1
3 1 2022-04-01 -10 1 # (*) see note below
4 2 2022-04-01 80 0
5 3 2022-04-01 400 1
6 1 2022-05-01 60 1
7 2 2022-05-01 60 1
# (*) this value is different from yours, but the overall gain is 390 (>0)
CodePudding user response:
import pandas as pd
df = pd.DataFrame([
[1,'2022-03-01',400],
[2,'2022-03-01',-100],
[3,'2022-03-01',5000],
[1,'2022-04-01',-10],
[2,'2022-04-01',80],
[3,'2022-04-01',400],
[1,'2022-05-01',60],
[2,'2022-05-01',60]
], columns=['user','month',])
data = (
pd
.concat([df.user, df.groupby('user')
.agg({'profit/loss':'cumsum'})
.rename(columns={'profit/loss':"cumsum"})], axis=1)
)
data["cummax"] = (
data
.groupby('user')
.agg({'cumsum':'cummax'})
)
data.apply(lambda x: 0 if x['cumsum'] < 0 else 0 if x["cumsum"] < x["cummax"] else 1, axis=1)
output:
0 1
1 0
2 1
3 0
4 0
5 1
6 1
7 1
dtype: int64