Pandas series split using a Dictionary that has the Split info
I have the below data frame
columns = ['ID','DESIGN_APPROVER','PLAN APPROVER','FINAL_APPROVER']
data = [['ID_001','Sheldon,US','Leonard,Canada', 'Howard,Japan'],
['ID_002','Joel,UK','Tommy,Paris', 'Angelo,Ireland']]
df = pd.DataFrame(data=data , columns=columns)
df
Data frame looks like
ID DESIGN_APPROVER PLAN APPROVER FINAL_APPROVER
0 ID_001 Sheldon,US Leonard,Canada Howard,Japan
1 ID_002 Joel,UK Tommy,Paris Angelo,Ireland
Also I have a dict which specifies how each of the column has to be splitted
key_list = list(set(df.columns)-{'ID'})
value_list = [['{}_NAME'.format(i),'{}_LOCATION'.format(i)] for i in key_list]
dict(zip(key_list, value_list))
split_dict
split_dict
{'PLAN APPROVER': ['PLAN APPROVER_NAME', 'PLAN APPROVER_LOCATION'],
'FINAL_APPROVER': ['FINAL_APPROVER_NAME', 'FINAL_APPROVER_LOCATION'],
'DESIGN_APPROVER': ['DESIGN_APPROVER_NAME','DESIGN_APPROVER_LOCATION']}
My Requirement: How to iterate through the dictionary to split the key column into value columns based on ',' delimiter. I tried the below code
for k,v in split_dict:
df[v] = df[k].str.split(',', expand=True)
but getting
ValueError: too many values to unpack (expected 2)
CodePudding user response:
Use a simple loop and set a max number of split that matches the length of the list (-1):
out = []
for k,v in split_dict.items():
# for each key, pop (=remove and returns) the column
# use it as input to str.split to split in n chunks on comma
# the the maximum number of split to x-1 where x is the number
# of items in the list
df[v] = df.pop(k).str.split(',', n=len(v)-1)
output:
ID PLAN APPROVER_NAME PLAN APPROVER_LOCATION FINAL_APPROVER_NAME FINAL_APPROVER_LOCATION DESIGN_APPROVER_NAME DESIGN_APPROVER_LOCATION
0 ID_001 Leonard Tommy Howard Angelo Sheldon Joel
1 ID_002 Canada Paris Japan Ireland US UK
NB. if you want to keep the original column, use slicing instead of pop
.:
out = []
for k,v in split_dict.items():
df[v] = df[k].str.split(',', n=len(v)-1)