After several hours looking for help, I cant find anything useful for me. Im facing with this problem: I have a DataFrame with the structure (little example)
site host
0 PO4102 hostname1
1 PO4102 hostname2
and a second Panas Series with differents timestamps
ini_time = '2022-11-09 00:00:00'
final_time = '2023-01-09 00:00:00'
df1 = pd.date_range(ini_time, final_time, freq="H")
DatetimeIndex(['2022-11-09 00:00:00', '2022-11-09 01:00:00',
'2022-11-09 02:00:00', '2022-11-09 03:00:00',
'2022-11-09 04:00:00', '2022-11-09 05:00:00',
'2022-11-09 06:00:00', '2022-11-09 07:00:00',
'2022-11-09 08:00:00', '2022-11-09 09:00:00',
...
'2023-01-08 15:00:00', '2023-01-08 16:00:00',
'2023-01-08 17:00:00', '2023-01-08 18:00:00',
'2023-01-08 19:00:00', '2023-01-08 20:00:00',
'2023-01-08 21:00:00', '2023-01-08 22:00:00',
'2023-01-08 23:00:00', '2023-01-09 00:00:00'],
dtype='datetime64[ns]', length=1465, freq='H')
My objetive is insert each date (1465 in this example) per items in DataFrame (duplicate df items). The final result will be a DataFrame with 2930 lines (2 lines per timestamp) Should look like
site host timestamp
0 PO4102 hostname1 2022-11-09 00:00:00
1 PO4102 hostname2 2022-11-09 00:00:00
2 PO4102 hostname1 2022-11-09 01:00:00
3 PO4102 hostname2 2022-11-09 01:00:00
4 PO4102 hostname1 2022-11-09 02:00:00
5 PO4102 hostname2 2022-11-09 02:00:00
.....
2929 PO4102 hostname1 2022-12-10 00:00:00
2930 PO4102 hostname2 2022-12-10 00:00:00
Can you help me?
Thanks in advance
CodePudding user response:
Use how='cross'
as parameter of merge
function to get the cartesian product of both dataframes:
df1 = pd.DataFrame({'timestamp': pd.date_range(ini_time, final_time, freq="H")})
out = df.merge(df1, how='cross').sort_values('timestamp', ignore_index=True)
print(out)
# Output
site host timestamp
0 PO4102 hostname1 2022-11-09 00:00:00
1 PO4102 hostname2 2022-11-09 00:00:00
2 PO4102 hostname2 2022-11-09 01:00:00
3 PO4102 hostname1 2022-11-09 01:00:00
4 PO4102 hostname1 2022-11-09 02:00:00
... ... ... ...
2925 PO4102 hostname1 2023-01-08 22:00:00
2926 PO4102 hostname2 2023-01-08 23:00:00
2927 PO4102 hostname1 2023-01-08 23:00:00
2928 PO4102 hostname1 2023-01-09 00:00:00
2929 PO4102 hostname2 2023-01-09 00:00:00
[2930 rows x 3 columns]