Home > Back-end >  Get first and Last date of each unique element in pandas dataframe
Get first and Last date of each unique element in pandas dataframe

Time:10-12

I have a dataframe like this :

A-Party Date & Time IMEI
3022275 2021-05-01 7:20:05 A
3022275 2021-05-02 8:20:05 A
3022275 2022-08-01 6:20:05 A
3022275 2021-03-02 9:20:05 B
3022275 2021-09-01 6:20:05 B
3022275 2022-05-01 10:20:05 B

I want first and last date of every unique IMEI:

Output I want:

A-Party Total IMEI IMEI First Date Last Date
3022275 2 A 2021-05-01 7:20:05 2022-08-01 6:20:05
B 2021-03-02 9:20:05 2022-05-01 10:20:05

I can achieve this nested for loop, but if there is any pandas way of directly doing this, that might be helpful

CodePudding user response:

Use names aggregation with GroupBy.agg with first and last or min and max:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
        .agg(**{'First Date':('Date & Time', 'first'),
                'Last Date': ('Date & Time', 'last')}))

df.insert(1,'Total IMEI', df.groupby('A-Party')['A-Party'].transform('size'))

print (df)
   A-Party  Total IMEI IMEI          First Date            Last Date
0  3022275           2    A  2021-05-01 7:20:05   2022-08-01 6:20:05
1  3022275           2    B  2021-03-02 9:20:05  2022-05-01 10:20:05

If need spaces (not recommended, because possible mixing numeric with strings ''):

df.loc[df.duplicated(['A-Party','Total IMEI']), ['A-Party','Total IMEI']] = ''

print (df)
   A-Party Total IMEI IMEI          First Date            Last Date
0  3022275          2    A  2021-05-01 7:20:05   2022-08-01 6:20:05
1                        B  2021-03-02 9:20:05  2022-05-01 10:20:05

If need MultiIndex:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
        .agg(**{'First Date':('Date & Time', 'first'),
                'Last Date': ('Date & Time', 'last')}))

df.insert(1,'Total IMEI', df.groupby('A-Party')['A-Party'].transform('size'))

df = df.set_index(['A-Party','Total IMEI','IMEI'])

print (df)
                                 First Date            Last Date
A-Party Total IMEI IMEI                                         
3022275 2          A     2021-05-01 7:20:05   2022-08-01 6:20:05
                   B     2021-03-02 9:20:05  2022-05-01 10:20:05

EDIT: Add GroupBy.size to named aggregations:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
        .agg(**{'Total IMEI':('IMEI', 'size'),
                'First Date':('Date & Time', 'first'),
                 'Last Date': ('Date & Time', 'last')}))
print (df)
   A-Party IMEI  Total IMEI          First Date            Last Date
0  3022275    A           3  2021-05-01 7:20:05   2022-08-01 6:20:05
1  3022275    B           3  2021-03-02 9:20:05  2022-05-01 10:20:05

df = df.set_index(['A-Party','Total IMEI','IMEI'])

print (df)
                                 First Date            Last Date
A-Party Total IMEI IMEI                                         
3022275 3          A     2021-05-01 7:20:05   2022-08-01 6:20:05
                   B     2021-03-02 9:20:05  2022-05-01 10:20:05

EDIT1:

df = (df.assign(dates = df['Date & Time'].dt.date)
        .groupby(['A-Party','IMEI'], as_index=False)
        .agg(**{'Total IMEI':('IMEI', 'size'),
                'No unique days': ('dates', 'nunique'),
                'First Date':('Date & Time', 'first'),
                 'Last Date': ('Date & Time', 'last')}))
print (df)
    

CodePudding user response:

I just form each column manually and then create the DataFrame.

A_Party = [df.iloc[0,0],'']
Total_IMEI = [df['IMEI'].nunique(),'']
IMEI = df['IMEI'].unique()
First_Date = df.groupby('IMEI')['Date & Time'].first()
Last_Date = df.groupby('IMEI')['Date & Time'].last()
col = ['A-Party','Total IMEI','IMEI','First Date','Last Date']

df1 = pd.DataFrame(np.array([A_Party,Total_IMEI,IMEI,First_Date,Last_Date]).T,columns=col)
df1
A-Party Total IMEI IMEI First Date Last Date
0 3022275 2 A 2021-05-01 7:20:05 2022-08-01 6:20:05
1 B 2021-03-02 9:20:05 2022-05-01 10:20:05
  • Related