I have a dataset like this:
project id | phase 1 | phase 2 | phase 3 |
---|---|---|---|
112 | 29 | 157 | 213 |
113 | 36 | 121 | 23 |
how can I get it transformed to a dataset like this:
project id | phase | days |
---|---|---|
112 | 1 | 29 |
112 | 2 | 157 |
112 | 3 | 213 |
112 | cumulative | 399 |
113 | 1 | 36 |
113 | 2 | 121 |
113 | 3 | 23 |
113 | cumulative | 180 |
thanks!
CodePudding user response:
here is another way to do it
calculate the cumulative by row, then melt it. finally replacing out the text phase
df['cumulative'] = df[['phase1','phase2','phase3']].sum(axis=1)
df=df.melt(id_vars='projectid', var_name='phase', value_name='days').sort_values('projectid')
df['phase'] = df['phase'].replace(r'phase','', regex=True)
df
projectid phase days
0 112 1 29
2 112 2 157
4 112 3 213
6 112 cumulative 399
1 113 1 36
3 113 2 121
5 113 3 23
7 113 cumulative 180
CodePudding user response:
Try:
df = df.set_index("project id")
df = (
df.assign(cumulative=df.sum(axis=1))
.stack()
.to_frame(name="days")
.reset_index()
.rename(columns={"level_1": "phase"})
)
df["phase"] = df["phase"].str.replace("phase ", "")
print(df)
Prints:
project id phase days
0 112 1 29
1 112 2 157
2 112 3 213
3 112 cumulative 399
4 113 1 36
5 113 2 121
6 113 3 23
7 113 cumulative 180