Home > front end >  Ungrouping values in same column into new lines
Ungrouping values in same column into new lines

Time:07-15

I want to separate cases where there are more than 1 responsible in a line. Then I would create a line for each of them, getting their respectives id's and weights, and keeping the other columns with the same values of the original table.

Basically I want to convert this table:

goal_id goal_name owner person_id person_name responsible_id responsible_name responsible_weight
65 Goal 111 Jade 137, 248, 544, 910 Robert, Bruce, James, Oliver 1033 Jade 25
72 Goal 222 Drake 377 Frank 15 Drake 10
39 Goal 333 Jimmy 72 Luke 49, 421 Brandon, Jimmy 30; 45
101 Goal 123 Michael 13, 22 Washington, Andrew 191, 1033, 248 Michael, Jade, Bruce 10; 10; 50

Into this table:

goal_id goal_name owner person_id person_name responsible_id responsible_name responsible_weight
65 Goal 111 Jade 137, 248, 544, 910 Robert, Bruce, James, Oliver 1033 Jade 25
72 Goal 222 Drake 377 Frank 15 Drake 10
39 Goal 333 Jimmy 72 Luke 49 Brandon 30
39 Goal 333 Jimmy 72 Luke 421 Jimmy 45
101 Goal 123 Michael 13, 22 Washington, Andrew 191 Michael 10
101 Goal 123 Michael 13, 22 Washington, Andrew 1033 Jade 10
101 Goal 123 Michael 13, 22 Washington, Andrew 248 Bruce 50

CodePudding user response:

We can combine Series.str.split and DataFrame.explode to split the strings in the "responsible_*" columns and create duplicate rows for these entries. Once we have those, we connect them to the rest of the original df using pd.concat on axis=1. This will fit all the (duplicated) indices together.

# get cols that start with "resonsible"
cols_splits = [col for col in df.columns if col.startswith('responsible')]

# list comprehension with split on ", " or "; " (necessary for "responsible_weight") 
# to get 3x pd.Series()
splits = [df[col].astype(str).str.split(', |; ', regex=True).explode() for col in cols_splits]

# index will be [0, 1, 2, 2, 3, 3, 3] for each series

# now, we want to concatenate the "unexploded" cols from the original df 
# with these 3 series.

# slice with all unexploded cols
unexploded_df = df.iloc[:,~df.columns.isin(cols_splits)]

# concat
new_df = pd.concat([unexploded_df,*splits], axis=1)

print(new_df)

   goal_id goal_name    owner           person_id  \
0       65  Goal 111     Jade  137, 248, 544, 910   
1       72  Goal 222    Drake                 377   
2       39  Goal 333    Jimmy                  72   
2       39  Goal 333    Jimmy                  72   
3      101  Goal 123  Michael              13, 22   
3      101  Goal 123  Michael              13, 22   
3      101  Goal 123  Michael              13, 22   

                    person_name responsible_id responsible_name  \
0  Robert, Bruce, James, Oliver           1033             Jade   
1                         Frank             15            Drake   
2                          Luke             49          Brandon   
2                          Luke            421            Jimmy   
3            Washington, Andrew            191          Michael   
3            Washington, Andrew           1033             Jade   
3            Washington, Andrew            248            Bruce   

  responsible_weight  
0                 25  
1                 10  
2                 30  
2                 45  
3                 10  
3                 10  
3                 50  

  • Related