Get first and Last date of each unique element in pandas dataframe-CodePudding

I have a dataframe like this :

A-Party	Date & Time	IMEI
3022275	2021-05-01 7:20:05	A
3022275	2021-05-02 8:20:05	A
3022275	2022-08-01 6:20:05	A
3022275	2021-03-02 9:20:05	B
3022275	2021-09-01 6:20:05	B
3022275	2022-05-01 10:20:05	B

I want first and last date of every unique IMEI:

Output I want:

A-Party	Total IMEI	IMEI	First Date	Last Date
3022275	2	A	2021-05-01 7:20:05	2022-08-01 6:20:05
		B	2021-03-02 9:20:05	2022-05-01 10:20:05

I can achieve this nested for loop, but if there is any pandas way of directly doing this, that might be helpful

CodePudding user response：

Use names aggregation with GroupBy.agg with first and last or min and max:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
        .agg(**{'First Date':('Date & Time', 'first'),
                'Last Date': ('Date & Time', 'last')}))

df.insert(1,'Total IMEI', df.groupby('A-Party')['A-Party'].transform('size'))

print (df)
   A-Party  Total IMEI IMEI          First Date            Last Date
0  3022275           2    A  2021-05-01 7:20:05   2022-08-01 6:20:05
1  3022275           2    B  2021-03-02 9:20:05  2022-05-01 10:20:05

If need spaces (not recommended, because possible mixing numeric with strings ''):

df.loc[df.duplicated(['A-Party','Total IMEI']), ['A-Party','Total IMEI']] = ''

print (df)
   A-Party Total IMEI IMEI          First Date            Last Date
0  3022275          2    A  2021-05-01 7:20:05   2022-08-01 6:20:05
1                        B  2021-03-02 9:20:05  2022-05-01 10:20:05

If need MultiIndex:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
        .agg(**{'First Date':('Date & Time', 'first'),
                'Last Date': ('Date & Time', 'last')}))

df.insert(1,'Total IMEI', df.groupby('A-Party')['A-Party'].transform('size'))

df = df.set_index(['A-Party','Total IMEI','IMEI'])

print (df)
                                 First Date            Last Date
A-Party Total IMEI IMEI                                         
3022275 2          A     2021-05-01 7:20:05   2022-08-01 6:20:05
                   B     2021-03-02 9:20:05  2022-05-01 10:20:05

EDIT: Add GroupBy.size to named aggregations:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
        .agg(**{'Total IMEI':('IMEI', 'size'),
                'First Date':('Date & Time', 'first'),
                 'Last Date': ('Date & Time', 'last')}))
print (df)
   A-Party IMEI  Total IMEI          First Date            Last Date
0  3022275    A           3  2021-05-01 7:20:05   2022-08-01 6:20:05
1  3022275    B           3  2021-03-02 9:20:05  2022-05-01 10:20:05

df = df.set_index(['A-Party','Total IMEI','IMEI'])

print (df)
                                 First Date            Last Date
A-Party Total IMEI IMEI                                         
3022275 3          A     2021-05-01 7:20:05   2022-08-01 6:20:05
                   B     2021-03-02 9:20:05  2022-05-01 10:20:05

EDIT1:

df = (df.assign(dates = df['Date & Time'].dt.date)
        .groupby(['A-Party','IMEI'], as_index=False)
        .agg(**{'Total IMEI':('IMEI', 'size'),
                'No unique days': ('dates', 'nunique'),
                'First Date':('Date & Time', 'first'),
                 'Last Date': ('Date & Time', 'last')}))
print (df)

CodePudding user response：

I just form each column manually and then create the DataFrame.

A_Party = [df.iloc[0,0],'']
Total_IMEI = [df['IMEI'].nunique(),'']
IMEI = df['IMEI'].unique()
First_Date = df.groupby('IMEI')['Date & Time'].first()
Last_Date = df.groupby('IMEI')['Date & Time'].last()
col = ['A-Party','Total IMEI','IMEI','First Date','Last Date']

df1 = pd.DataFrame(np.array([A_Party,Total_IMEI,IMEI,First_Date,Last_Date]).T,columns=col)
df1

	A-Party	Total IMEI	IMEI	First Date	Last Date
0	3022275	2	A	2021-05-01 7:20:05	2022-08-01 6:20:05
1			B	2021-03-02 9:20:05	2022-05-01 10:20:05