I have gone through Convert columns into rows with Pandas and Merge timestamp column in pandas
, the goal is to first group data by ID
and then convert start_time
column into an entity in the process column
Given
start_time time process ID
14:05 14:16 A 1
14:05 14:34 B 1
14:05 15:00 C 1
14:05 15:10 D 1
14:12 14:19 A 2
14:12 14:54 B 2
Goal
time process ID
14:05 start_time 1 (Previously it was in separate column)
14:16 A 1
14:34 B 1
15:00 C 1
15:10 D 1
14:12 start_time 2
14:19 A 2
14:54 B 2
df.groupby('ID').melt(df.columns.difference(['start_time']), value_name='time')
Note:start_time
value in each ID remains the same.
CodePudding user response:
You can treat your data as 2 separate DataFrames
and recombine them like so:
# Extract start_times and clean up to match column names
start_times = (
df[['start_time', 'ID']]
.drop_duplicates()
.rename(columns={'start_time': 'time'})
.assign(process='start_time')
)
# combine data vertically
out = (
pd.concat([start_times, df.drop(columns='start_time')])
.sort_values(['ID', 'time'])
.reset_index(drop=True)
)
print(out)
time ID process
0 14:05 1 start_time
1 14:16 1 A
2 14:34 1 B
3 15:00 1 C
4 15:10 1 D
5 14:12 2 start_time
6 14:19 2 A
7 14:54 2 B
CodePudding user response:
You could use:
cols = df.columns.difference(['start_time', 'process']).to_list()
# identify first row per group
mask = ~df.duplicated('ID')
# melt first row per group
df2 = (df[mask]
.drop(columns=['process', 'time'])
.melt(cols, var_name='process', value_name='time')
)
# concatenate with original dataframe and reorder
out = (pd.concat([df2, df])
.sort_values(by='ID', kind='stable')
[['time', 'process'] cols]
#.reset_index(drop=True) # optional
)
output:
time process ID
0 14:05 start_time 1
0 14:16 A 1
1 14:34 B 1
2 15:00 C 1
3 15:10 D 1
1 14:12 start_time 2
4 14:19 A 2
5 14:54 B 2