I have a df in the following form:
id sid steps
A 1 step1
A 1 step1-step2
A 1 step1-step2-step3
This contains data of how a user A
went through a certain series of pages (steps) in a given session (sid). I want to take these dash delimitated steps and create individual columns for every step.
Result:
id sid steps page_step1 page_step2 page_step3
A 1 step1 step1 NA NA
A 1 step1-step2 step1 step2 NA
A 1 step1-step2-step3 step1 step2 step3
I don't know exactly how many steps so I was hoping that they're created dynamically. Been stuck on this all week, thanks!
CodePudding user response:
Use str.split
:
>>> df.join(df["steps"].str.split("-",expand=True).add_prefix("page_step"))
id sid steps page_step0 page_step1 page_step2
0 A 1 step1 step1 None None
1 A 1 step1-step2 step1 step2 None
2 A 1 step1-step2-step3 step1 step2 step3
If the numbering must start from 1 instead of 0:
steps = df["steps"].str.split("-",expand=True)
output = df.join(steps.rename(columns={i: f"page_setup{i 1}" for i in steps.columns}))
>>> output
id sid steps page_setup1 page_setup2 page_setup3
0 A 1 step1 step1 None None
1 A 1 step1-step2 step1 step2 None
2 A 1 step1-step2-step3 step1 step2 step3