Home > Software engineering >  How can I flag missing start/end dates per account in pandas?
How can I flag missing start/end dates per account in pandas?

Time:06-29

I have a pandas related question. My dataframe has daily records of unique account numbers:

Account Date Balance
1 03/01/2022 132
1 03/02/2022 133
1 03/03/2022 135
2 03/02/2022 44
2 03/03/2022 48
3 03/01/2022 65
3 03/02/2022 67
4 03/01/2022 757
4 03/02/2022 788
4 03/03/2022 799

As you can see, account 2 is missing the first date (03/01/2022) a account 3 is missing 03/03/2022. I would like to be able to create 2 new columns to identify with a flag which of these accounts are new and which of these no longer exist due to a missing flag based on the min/max date. Unfortunately I have absolutely no idea how to start on this.

CodePudding user response:

You can use groupby.transform to get the min/max date per group and compare to the absolute min/max, then assign boolean columns:

s = pd.to_datetime(df['Date'], dayfirst=False)
g = s.groupby(df['Account'])
df['client left'] = g.transform('max').ne(s.max())  # or lt (<)
df['new client'] = g.transform('min').ne(s.min())   # or gt (>)

NB. you can also use fixed references instead of the computed global min/max.

output:

   Account        Date  Balance  client left  new client
0        1  03/01/2022      132        False       False
1        1  03/02/2022      133        False       False
2        1  03/03/2022      135        False       False
3        2  03/02/2022       44        False        True
4        2  03/03/2022       48        False        True
5        3  03/01/2022       65         True       False
6        3  03/02/2022       67         True       False
7        4  03/01/2022      757        False       False
8        4  03/02/2022      788        False       False
9        4  03/03/2022      799        False       False

alternative output

pd.to_datetime(df['Date'], dayfirst=False)
g = s.groupby(df['Account'])

out = pd.DataFrame({'new client': g.min().ne(s.min()),    # or gt (>)
                    'client left': g.max().ne(s.max())})  # or lt (<)

print(out)

output:

         new client  client left
Account                         
1             False        False
2              True        False
3             False         True
4             False        False
  • Related