I have these two dataframes:
import pandas as pd
from pandas import Timestamp
df_1 = pd.DataFrame({'id': {0: 'A',
1: 'A',
2: 'B',
3: 'C',
4: 'C'},
'IdOrder': {0: 1, 1: 2, 2: 1, 3: 1, 4: 2},
'TrackDateTime': {0: Timestamp('2020-01-21 23:28:35'),
1: Timestamp('2020-01-28 17:12:15'),
2: Timestamp('2020-01-07 12:41:48'),
3: Timestamp('2020-01-01 22:13:44'),
4: Timestamp('2020-01-01 22:49:53')}})
df_1
df_2 = pd.DataFrame({'id': {0: 'A',
1: 'B',
2: 'C',
3: 'D',
4: 'E'},
'InitialDate': {0: Timestamp('2020-01-21 23:28:35'),
1: Timestamp('2020-01-07 12:41:48'),
2: Timestamp('2020-01-01 22:13:44'),
3: Timestamp('2020-01-02 15:45:10'),
4: Timestamp('2020-01-02 22:21:36')},
'EndDate': {0: Timestamp('2020-01-28 00:00:00'),
1: Timestamp('2020-01-08 00:00:00'),
2: Timestamp('2020-01-03 00:00:00'),
3: Timestamp('2020-01-06 00:00:00'),
4: Timestamp('2020-04-10 00:00:00')}})
df_2
And I'm looking to enclose each id
in df_1
with the InitialDate
and EndDate
from df_2
, which would give this expected output:
Also please consider that:
IdOrder=0
forInitialDate
andIdOrder=-1
forEndDate
- I would like to keep the time granularity up to seconds (the output image doesn't show it because of the excel format)
I haven't been able to figure out a solution for this. Any suggestions? :)
CodePudding user response:
You can use melt
, concat
and a custom sorting:
tmp = (
df_2[df_2['id'].isin(df_1['id'])]
.rename(columns={'InitialDate':0, 'EndDate': -1})
.melt('id', var_name='IdOrder', value_name='TrackDateTime')
)
out = (pd.concat([tmp, df_1])
.sort_values(by='IdOrder', key=lambda s: s.replace(-1, np.inf))
.sort_values(by='id')
)
print(out)
Output:
id IdOrder TrackDateTime
0 A 0 2020-01-21 23:28:35
0 A 1 2020-01-21 23:28:35
1 A 2 2020-01-28 17:12:15
3 A -1 2020-01-28 00:00:00
1 B 0 2020-01-07 12:41:48
2 B 1 2020-01-07 12:41:48
4 B -1 2020-01-08 00:00:00
2 C 0 2020-01-01 22:13:44
3 C 1 2020-01-01 22:13:44
4 C 2 2020-01-01 22:49:53
5 C -1 2020-01-03 00:00:00