I want to separate cases where there are more than 1 responsible in a line. Then I would create a line for each of them, getting their respectives id's and weights, and keeping the other columns with the same values of the original table.
Basically I want to convert this table:
goal_id | goal_name | owner | person_id | person_name | responsible_id | responsible_name | responsible_weight |
---|---|---|---|---|---|---|---|
65 | Goal 111 | Jade | 137, 248, 544, 910 | Robert, Bruce, James, Oliver | 1033 | Jade | 25 |
72 | Goal 222 | Drake | 377 | Frank | 15 | Drake | 10 |
39 | Goal 333 | Jimmy | 72 | Luke | 49, 421 | Brandon, Jimmy | 30; 45 |
101 | Goal 123 | Michael | 13, 22 | Washington, Andrew | 191, 1033, 248 | Michael, Jade, Bruce | 10; 10; 50 |
Into this table:
goal_id | goal_name | owner | person_id | person_name | responsible_id | responsible_name | responsible_weight |
---|---|---|---|---|---|---|---|
65 | Goal 111 | Jade | 137, 248, 544, 910 | Robert, Bruce, James, Oliver | 1033 | Jade | 25 |
72 | Goal 222 | Drake | 377 | Frank | 15 | Drake | 10 |
39 | Goal 333 | Jimmy | 72 | Luke | 49 | Brandon | 30 |
39 | Goal 333 | Jimmy | 72 | Luke | 421 | Jimmy | 45 |
101 | Goal 123 | Michael | 13, 22 | Washington, Andrew | 191 | Michael | 10 |
101 | Goal 123 | Michael | 13, 22 | Washington, Andrew | 1033 | Jade | 10 |
101 | Goal 123 | Michael | 13, 22 | Washington, Andrew | 248 | Bruce | 50 |
CodePudding user response:
We can combine Series.str.split
and DataFrame.explode
to split the strings in the "responsible_*" columns and create duplicate rows for these entries. Once we have those, we connect them to the rest of the original df using pd.concat
on axis=1
. This will fit all the (duplicated) indices together.
# get cols that start with "resonsible"
cols_splits = [col for col in df.columns if col.startswith('responsible')]
# list comprehension with split on ", " or "; " (necessary for "responsible_weight")
# to get 3x pd.Series()
splits = [df[col].astype(str).str.split(', |; ', regex=True).explode() for col in cols_splits]
# index will be [0, 1, 2, 2, 3, 3, 3] for each series
# now, we want to concatenate the "unexploded" cols from the original df
# with these 3 series.
# slice with all unexploded cols
unexploded_df = df.iloc[:,~df.columns.isin(cols_splits)]
# concat
new_df = pd.concat([unexploded_df,*splits], axis=1)
print(new_df)
goal_id goal_name owner person_id \
0 65 Goal 111 Jade 137, 248, 544, 910
1 72 Goal 222 Drake 377
2 39 Goal 333 Jimmy 72
2 39 Goal 333 Jimmy 72
3 101 Goal 123 Michael 13, 22
3 101 Goal 123 Michael 13, 22
3 101 Goal 123 Michael 13, 22
person_name responsible_id responsible_name \
0 Robert, Bruce, James, Oliver 1033 Jade
1 Frank 15 Drake
2 Luke 49 Brandon
2 Luke 421 Jimmy
3 Washington, Andrew 191 Michael
3 Washington, Andrew 1033 Jade
3 Washington, Andrew 248 Bruce
responsible_weight
0 25
1 10
2 30
2 45
3 10
3 10
3 50