Home > Software engineering >  Error while trying Pandas series split using a Dictionary that has the Split info
Error while trying Pandas series split using a Dictionary that has the Split info

Time:10-14

Pandas series split using a Dictionary that has the Split info

I have the below data frame

columns = ['ID','DESIGN_APPROVER','PLAN APPROVER','FINAL_APPROVER']
data = [['ID_001','Sheldon,US','Leonard,Canada', 'Howard,Japan'],
        ['ID_002','Joel,UK','Tommy,Paris', 'Angelo,Ireland']]
df = pd.DataFrame(data=data , columns=columns)
df

Data frame looks like

        ID      DESIGN_APPROVER PLAN APPROVER   FINAL_APPROVER
0   ID_001  Sheldon,US      Leonard,Canada  Howard,Japan
1   ID_002  Joel,UK         Tommy,Paris     Angelo,Ireland   

    

Also I have a dict which specifies how each of the column has to be splitted

key_list = list(set(df.columns)-{'ID'})
value_list = [['{}_NAME'.format(i),'{}_LOCATION'.format(i)] for i in key_list]
dict(zip(key_list, value_list))
split_dict

split_dict

{'PLAN APPROVER': ['PLAN APPROVER_NAME', 'PLAN APPROVER_LOCATION'],
 'FINAL_APPROVER': ['FINAL_APPROVER_NAME', 'FINAL_APPROVER_LOCATION'],
 'DESIGN_APPROVER': ['DESIGN_APPROVER_NAME','DESIGN_APPROVER_LOCATION']}

My Requirement: How to iterate through the dictionary to split the key column into value columns based on ',' delimiter. I tried the below code

    for k,v in split_dict:
        df[v] = df[k].str.split(',', expand=True)

but getting

ValueError: too many values to unpack (expected 2)

CodePudding user response:

Use a simple loop and set a max number of split that matches the length of the list (-1):

out = []
for k,v in split_dict.items():
    # for each key, pop (=remove and returns) the column
    # use it as input to str.split to split in n chunks on comma
    # the the maximum number of split to x-1 where x is the number
    # of items in the list
    df[v] = df.pop(k).str.split(',', n=len(v)-1)

output:

       ID PLAN APPROVER_NAME PLAN APPROVER_LOCATION FINAL_APPROVER_NAME FINAL_APPROVER_LOCATION DESIGN_APPROVER_NAME DESIGN_APPROVER_LOCATION
0  ID_001            Leonard                  Tommy              Howard                  Angelo              Sheldon                     Joel
1  ID_002             Canada                  Paris               Japan                 Ireland                   US                       UK

NB. if you want to keep the original column, use slicing instead of pop.:

out = []
for k,v in split_dict.items():
    df[v] = df[k].str.split(',', n=len(v)-1)
  • Related