I have the following dataframe in Python:
ID | country_ID | visit_time |
---|---|---|
0 | ESP | 10 days 12:03:00 |
0 | ENG | 5 days 10:02:00 |
1 | ENG | 3 days 08:05:03 |
1 | ESP | 1 days 03:02:00 |
1 | ENG | 2 days 07:01:03 |
1 | ENG | 3 days 01:00:52 |
2 | ENG | 0 days 12:01:02 |
2 | ENG | 1 days 22:10:03 |
2 | ENG | 0 days 20:00:50 |
For each ID, I want to get:
avg_visit_ESP and avg_visit_ENG columns.
- Average time visit with country_ID = ESP for each ID.
- Average time visit with country_ID = ENG for each ID.
ID | avg_visit_ESP | avg_visit_ENG |
---|---|---|
0 | 10 days 12:03:00 | 5 days 10:02:00 |
1 | 1 days 03:02:00 | (8 days 16:06:58) / 3 |
2 | NaT | (3 days 06:11:55) / 3 |
I don't know how to specify in groupby a double grouping, first by ID and then by country_ID. If you can help me I would appreciate it.
P.S.: The date format of visit_time (timedelta), can perform addition and division without any apparent problem.
from datetime import datetime, timedelta
date1 = pd.to_datetime('2022-02-04 10:10:21', format='%Y-%m-%d %H:%M:%S')
date2 = pd.to_datetime('2022-02-05 20:15:41', format='%Y-%m-%d %H:%M:%S')
date3 = pd.to_datetime('2022-02-07 20:15:41', format='%Y-%m-%d %H:%M:%S')
sum1date = date2-date1
sum2date = date3-date2
sum3date = date3-date1
print((sum1date sum2date sum3date)/3)
CodePudding user response:
(df.groupby(['ID', 'country_ID'])['visit_time']
.mean(numeric_only=False)
.unstack()
.add_prefix('avg_visit_')
)
should do the trick
>>> df = pd.read_clipboard(sep='\s\s ')
>>> df.columns = [s.strip() for s in df]
>>> df['visit_time'] = pd.to_timedelta(df['visit_time'])
>>> df.groupby(['ID', 'country_ID'])['visit_time'].mean(numeric_only=False).unstack().add_prefix('avg_visit_')
country_ID avg_visit_ENG avg_visit_ESP
ID
0 5 days 10:02:00 10 days 12:03:00
1 2 days 21:22:19.333333333 1 days 03:02:00
2 1 days 02:03:58.333333333 NaT