I have a dataframe like this :
A-Party | Date & Time | IMEI |
---|---|---|
3022275 | 2021-05-01 7:20:05 | A |
3022275 | 2021-05-02 8:20:05 | A |
3022275 | 2022-08-01 6:20:05 | A |
3022275 | 2021-03-02 9:20:05 | B |
3022275 | 2021-09-01 6:20:05 | B |
3022275 | 2022-05-01 10:20:05 | B |
I want first and last date of every unique IMEI:
Output I want:
A-Party | Total IMEI | IMEI | First Date | Last Date |
---|---|---|---|---|
3022275 | 2 | A | 2021-05-01 7:20:05 | 2022-08-01 6:20:05 |
B | 2021-03-02 9:20:05 | 2022-05-01 10:20:05 |
I can achieve this nested for loop, but if there is any pandas way of directly doing this, that might be helpful
CodePudding user response:
Use names aggregation with GroupBy.agg
with first
and last
or min
and max
:
df = (df.groupby(['A-Party','IMEI'], as_index=False)
.agg(**{'First Date':('Date & Time', 'first'),
'Last Date': ('Date & Time', 'last')}))
df.insert(1,'Total IMEI', df.groupby('A-Party')['A-Party'].transform('size'))
print (df)
A-Party Total IMEI IMEI First Date Last Date
0 3022275 2 A 2021-05-01 7:20:05 2022-08-01 6:20:05
1 3022275 2 B 2021-03-02 9:20:05 2022-05-01 10:20:05
If need spaces (not recommended, because possible mixing numeric with strings ''
):
df.loc[df.duplicated(['A-Party','Total IMEI']), ['A-Party','Total IMEI']] = ''
print (df)
A-Party Total IMEI IMEI First Date Last Date
0 3022275 2 A 2021-05-01 7:20:05 2022-08-01 6:20:05
1 B 2021-03-02 9:20:05 2022-05-01 10:20:05
If need MultiIndex
:
df = (df.groupby(['A-Party','IMEI'], as_index=False)
.agg(**{'First Date':('Date & Time', 'first'),
'Last Date': ('Date & Time', 'last')}))
df.insert(1,'Total IMEI', df.groupby('A-Party')['A-Party'].transform('size'))
df = df.set_index(['A-Party','Total IMEI','IMEI'])
print (df)
First Date Last Date
A-Party Total IMEI IMEI
3022275 2 A 2021-05-01 7:20:05 2022-08-01 6:20:05
B 2021-03-02 9:20:05 2022-05-01 10:20:05
EDIT: Add GroupBy.size
to named aggregations:
df = (df.groupby(['A-Party','IMEI'], as_index=False)
.agg(**{'Total IMEI':('IMEI', 'size'),
'First Date':('Date & Time', 'first'),
'Last Date': ('Date & Time', 'last')}))
print (df)
A-Party IMEI Total IMEI First Date Last Date
0 3022275 A 3 2021-05-01 7:20:05 2022-08-01 6:20:05
1 3022275 B 3 2021-03-02 9:20:05 2022-05-01 10:20:05
df = df.set_index(['A-Party','Total IMEI','IMEI'])
print (df)
First Date Last Date
A-Party Total IMEI IMEI
3022275 3 A 2021-05-01 7:20:05 2022-08-01 6:20:05
B 2021-03-02 9:20:05 2022-05-01 10:20:05
EDIT1:
df = (df.assign(dates = df['Date & Time'].dt.date)
.groupby(['A-Party','IMEI'], as_index=False)
.agg(**{'Total IMEI':('IMEI', 'size'),
'No unique days': ('dates', 'nunique'),
'First Date':('Date & Time', 'first'),
'Last Date': ('Date & Time', 'last')}))
print (df)
CodePudding user response:
I just form each column manually and then create the DataFrame.
A_Party = [df.iloc[0,0],'']
Total_IMEI = [df['IMEI'].nunique(),'']
IMEI = df['IMEI'].unique()
First_Date = df.groupby('IMEI')['Date & Time'].first()
Last_Date = df.groupby('IMEI')['Date & Time'].last()
col = ['A-Party','Total IMEI','IMEI','First Date','Last Date']
df1 = pd.DataFrame(np.array([A_Party,Total_IMEI,IMEI,First_Date,Last_Date]).T,columns=col)
df1
A-Party | Total IMEI | IMEI | First Date | Last Date | |
---|---|---|---|---|---|
0 | 3022275 | 2 | A | 2021-05-01 7:20:05 | 2022-08-01 6:20:05 |
1 | B | 2021-03-02 9:20:05 | 2022-05-01 10:20:05 |