I have a pandas related question. My dataframe has daily records of unique account numbers:
Account | Date | Balance |
---|---|---|
1 | 03/01/2022 | 132 |
1 | 03/02/2022 | 133 |
1 | 03/03/2022 | 135 |
2 | 03/02/2022 | 44 |
2 | 03/03/2022 | 48 |
3 | 03/01/2022 | 65 |
3 | 03/02/2022 | 67 |
4 | 03/01/2022 | 757 |
4 | 03/02/2022 | 788 |
4 | 03/03/2022 | 799 |
As you can see, account 2 is missing the first date (03/01/2022) a account 3 is missing 03/03/2022. I would like to be able to create 2 new columns to identify with a flag which of these accounts are new and which of these no longer exist due to a missing flag based on the min/max date. Unfortunately I have absolutely no idea how to start on this.
CodePudding user response:
You can use groupby.transform
to get the min/max date per group and compare to the absolute min/max, then assign boolean columns:
s = pd.to_datetime(df['Date'], dayfirst=False)
g = s.groupby(df['Account'])
df['client left'] = g.transform('max').ne(s.max()) # or lt (<)
df['new client'] = g.transform('min').ne(s.min()) # or gt (>)
NB. you can also use fixed references instead of the computed global min/max.
output:
Account Date Balance client left new client
0 1 03/01/2022 132 False False
1 1 03/02/2022 133 False False
2 1 03/03/2022 135 False False
3 2 03/02/2022 44 False True
4 2 03/03/2022 48 False True
5 3 03/01/2022 65 True False
6 3 03/02/2022 67 True False
7 4 03/01/2022 757 False False
8 4 03/02/2022 788 False False
9 4 03/03/2022 799 False False
alternative output
pd.to_datetime(df['Date'], dayfirst=False)
g = s.groupby(df['Account'])
out = pd.DataFrame({'new client': g.min().ne(s.min()), # or gt (>)
'client left': g.max().ne(s.max())}) # or lt (<)
print(out)
output:
new client client left
Account
1 False False
2 True False
3 False True
4 False False