I have date and time data now I want to reduce this dataframe to two columns with Timestamp (date time) in a column and value in another column
current df -
Date 8 am 10 am 1 pm
-----------------------------------------------
23/02/2022 5 10 11
24/02/2022 2 17 15
25/02/2022 7 90 175
desired df -
Timestamp value
----------------------------------------------
2022-02-23 00:08:00 5
2022-02-23 00:10:00 10
2022-02-23 00:13:00 11
2022-02-24 00:08:00 2
2022-02-24 00:10:00 17
2022-02-24 00:13:00 15
2022-02-25 00:08:00 7
2022-02-25 00:10:00 90
2022-02-25 00:13:00 175
CodePudding user response:
IIUC, use melt
and to_datetime
:
(df
.reset_index() # only if not already index
.melt(id_vars='Date', var_name='time')
.assign(Timestamp=lambda d: pd.to_datetime(d['Date'] ' ' d['time']))
[['Timestamp', 'value']]
# below optional
.sort_values(by='Timestamp').reset_index(drop=True)
)
Output:
Timestamp value
0 2022-02-23 08:00:00 5
1 2022-02-23 10:00:00 10
2 2022-02-23 13:00:00 11
3 2022-02-24 08:00:00 2
4 2022-02-24 10:00:00 17
5 2022-02-24 13:00:00 15
6 2022-02-25 08:00:00 7
7 2022-02-25 10:00:00 90
8 2022-02-25 13:00:00 175
CodePudding user response:
Use melt
:
combine_datetime = lambda x: pd.to_datetime(x['Date'] ' ' x['Time'],
format='%d/%m/%Y %I %p')
out = (
df.reset_index().melt('Date', var_name='Time').assign(Timestamp=combine_datetime) \
.sort_values('Timestamp', ignore_index=True)[['Timestamp', 'value']]
)
print(out)
# Output
Timestamp value
0 2022-02-23 08:00:00 5
1 2022-02-23 10:00:00 10
2 2022-02-23 13:00:00 11
3 2022-02-24 08:00:00 2
4 2022-02-24 10:00:00 17
5 2022-02-24 13:00:00 15
6 2022-02-25 08:00:00 7
7 2022-02-25 10:00:00 90
8 2022-02-25 13:00:00 175
Notes
I used
reset_index
because we don't know ifDate
is a column or the index. If it's a column, you can removereset_index
. In all cases, there is no side effect here to use it withmelt
.For
pd.to_datetime
, I used an explicit format to avoid Pandas infer the datetime if there are any ambiguities with the day first.
CodePudding user response:
Set date as index, stack and coerce concatenated time component into datetime;
s =df.set_index('Date').stack().to_frame('value').reset_index()
s=s.assign(Timestamp=pd.to_datetime(s['Date'].str.cat(s['level_1'], sep =' ')))[['Timestamp', 'value']]
Timestamp value
0 2022-02-23 08:00:00 5
1 2022-02-23 10:00:00 10
2 2022-02-23 13:00:00 11
3 2022-02-24 08:00:00 2
4 2022-02-24 10:00:00 17
5 2022-02-24 13:00:00 15
6 2022-02-25 08:00:00 7
7 2022-02-25 10:00:00 90
8 2022-02-25 13:00:00 175