I have a pandas dataframe and would like to split each row containing multiple tasks into a new row.
The dataframe columns are:
start_t_i = start time of a task.
end_t_i = end time of a task.
weight_i = cost of a task.
For example, suppose I have the following dataframe df1:
task_name | start_t1 | end_t1 | weight_1 | start_t_2 | end_t2 | weight_2.. | start_t_k | end_t_k | weight_k |
---|---|---|---|---|---|---|---|---|---|
john | 5 | 7 | 1 | 9 | 10 | 9 | |||
sally | 3 | 4 | 1 | 8 | 11 | 7 | 19 | 21 | 1 |
tom | 1 | 2 | 3 |
I would like to transform it into the following df2:
task_name | start_t | end_t | weight |
---|---|---|---|
john | 5 | 7 | 1 |
john | 9 | 10 | 9 |
sally | 3 | 4 | 1 |
sally | 8 | 11 | 7 |
sally | 19 | 21 | 1 |
tom | 1 | 2 | 3 |
so far I managed to transform df1 manually into df2 by assuming each person has only a maximum of two tasks. my question is how can I get a df such as df2 from df1 when there are up to k tasks for each person.
CodePudding user response:
If we assume that every task has a start_t, end_t and weight_t
columns then we can use:
cols = list(df.columns[1:])
task_col = df.columns[0]
dfs = []
for i in range(0, len(cols), 3):
subset_cols = cols[i:i 3]
rename_cols = {subset_cols[0]: 'start_t', subset_cols[1]:'end_t', subset_cols[2]:'weight'}
dfs.append(df[[task_col] subset_cols].rename(columns=rename_cols))
transformed_df = pd.concat(dfs, ignore_index=True).sort_values(by='task_name').dropna(subset=['start_t', 'end_t', 'weight']).reset_index(drop=True)
OUTPUT:
task_name start_t end_t weight
0 john 5.0 7.0 1.0
1 john 9.0 10.0 9.0
2 sally 3.0 4.0 1.0
3 sally 8.0 11.0 7.0
4 sally 19.0 21.0 1.0
5 tom 1.0 2.0 3.0
CodePudding user response:
You can melt
the columns, extract the valid column names and pivot
:
df = df.replace("", np.NaN).melt("task_name").dropna()
df["variable"] = df["variable"].str.extract("(^[A-Za-z] )_")
print (df.assign(count=df.groupby(["task_name", "variable"]).cumcount())
.pivot(["count", "task_name"], "variable", "value")
.reset_index(0, drop=True).sort_index())
variable end start weight
task_name
john 7.0 5.0 1.0
john 10.0 9.0 9.0
sally 4.0 3.0 1.0
sally 11.0 8.0 7.0
sally 21.0 19.0 1.0
tom 2.0 1.0 3.0