I want to combine 3 diferent timestamp into one column of timestamp
I have 3 different column that log the time the job is created, rejected and completed. i want to combine it all into one Timestamp column.
This is an example data that i have
ID | j_date_created | j_date_completed | j_date_rejected |
---|---|---|---|
1 | 23.03.2021 07:34:34 | 23.03.2021 07:34:36 | null |
2 | 23.03.2021 07:34:34 | 23.03.2021 07:34:37 | null |
3 | 23.03.2021 07:34:34 | null | 04.04.2021 17:48:06 |
the end goal that i want is
ID | timestamp | Activity |
---|---|---|
1 | 23.03.2021 07:34:34 | created |
1 | 23.03.2021 07:34:36 | completed |
2 | 23.03.2021 07:34:34 | created |
2 | 23.03.2021 07:34:37 | completed |
3 | 23.03.2021 07:34:34 | created |
3 | 04.04.2021 17:48:06 | rejected |
I am writing the project with pandas
Thanks!!
CodePudding user response:
After renaming the columns, you can get the output you need using melt
.
import numpy as np
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3],
'j_date_created': ['23.03.2021 07:34:34', '23.03.2021 07:34:34', '23.03.2021 07:34:34'],
'j_date_completed': ['23.03.2021 07:34:36', '23.03.2021 07:34:37', np.NaN],
'j_date_rejected': [np.NaN, np.NaN, '04.04.2021 17:48:06']})
df.columns = ['ID', 'created', 'completed', 'rejected']
df2 = df.melt(id_vars=['ID'], value_vars=['created', 'completed', 'rejected']).dropna().sort_values(by='ID')
df2.columns = ['ID', 'Activity', 'Timestamp']
print(df2)
This stacks the timestamp values in the created
, completed
, and rejected
columns and makes a new column with those old column names as values. It also drops rows with missing values in that column, then sorts by the ID
column to get the correct grouping. Last is renaming the columns to replace the default column names provided by melt
.
Output:
ID Activity Timestamp
0 1 created 23.03.2021 07:34:34
3 1 completed 23.03.2021 07:34:36
1 2 created 23.03.2021 07:34:34
4 2 completed 23.03.2021 07:34:37
2 3 created 23.03.2021 07:34:34
8 3 rejected 04.04.2021 17:48:06