Ungrouping values in same column into new lines-CodePudding

I want to separate cases where there are more than 1 responsible in a line. Then I would create a line for each of them, getting their respectives id's and weights, and keeping the other columns with the same values of the original table.

Basically I want to convert this table:

goal_id	goal_name	owner	person_id	person_name	responsible_id	responsible_name	responsible_weight
65	Goal 111	Jade	137, 248, 544, 910	Robert, Bruce, James, Oliver	1033	Jade	25
72	Goal 222	Drake	377	Frank	15	Drake	10
39	Goal 333	Jimmy	72	Luke	49, 421	Brandon, Jimmy	30; 45
101	Goal 123	Michael	13, 22	Washington, Andrew	191, 1033, 248	Michael, Jade, Bruce	10; 10; 50

Into this table:

goal_id	goal_name	owner	person_id	person_name	responsible_id	responsible_name	responsible_weight
65	Goal 111	Jade	137, 248, 544, 910	Robert, Bruce, James, Oliver	1033	Jade	25
72	Goal 222	Drake	377	Frank	15	Drake	10
39	Goal 333	Jimmy	72	Luke	49	Brandon	30
39	Goal 333	Jimmy	72	Luke	421	Jimmy	45
101	Goal 123	Michael	13, 22	Washington, Andrew	191	Michael	10
101	Goal 123	Michael	13, 22	Washington, Andrew	1033	Jade	10
101	Goal 123	Michael	13, 22	Washington, Andrew	248	Bruce	50

CodePudding user response：

We can combine Series.str.split and DataFrame.explode to split the strings in the "responsible_*" columns and create duplicate rows for these entries. Once we have those, we connect them to the rest of the original df using pd.concat on axis=1. This will fit all the (duplicated) indices together.

# get cols that start with "resonsible"
cols_splits = [col for col in df.columns if col.startswith('responsible')]

# list comprehension with split on ", " or "; " (necessary for "responsible_weight") 
# to get 3x pd.Series()
splits = [df[col].astype(str).str.split(', |; ', regex=True).explode() for col in cols_splits]

# index will be [0, 1, 2, 2, 3, 3, 3] for each series

# now, we want to concatenate the "unexploded" cols from the original df 
# with these 3 series.

# slice with all unexploded cols
unexploded_df = df.iloc[:,~df.columns.isin(cols_splits)]

# concat
new_df = pd.concat([unexploded_df,*splits], axis=1)

print(new_df)

   goal_id goal_name    owner           person_id  \
0       65  Goal 111     Jade  137, 248, 544, 910   
1       72  Goal 222    Drake                 377   
2       39  Goal 333    Jimmy                  72   
2       39  Goal 333    Jimmy                  72   
3      101  Goal 123  Michael              13, 22   
3      101  Goal 123  Michael              13, 22   
3      101  Goal 123  Michael              13, 22   

                    person_name responsible_id responsible_name  \
0  Robert, Bruce, James, Oliver           1033             Jade   
1                         Frank             15            Drake   
2                          Luke             49          Brandon   
2                          Luke            421            Jimmy   
3            Washington, Andrew            191          Michael   
3            Washington, Andrew           1033             Jade   
3            Washington, Andrew            248            Bruce   

  responsible_weight  
0                 25  
1                 10  
2                 30  
2                 45  
3                 10  
3                 10  
3                 50