I've got this DataFrame in Python using pandas:
Column 1 | Column 2 | Column 3 |
---|---|---|
hello | a,b,c | 1,2,3 |
hi | b,c,a | 4,5,6 |
The values in column 3 belong to the categories in column 2. Is there a way to combine columns 2 and 3 that I get this output?
Column 1 | a | b | c |
---|---|---|---|
hello | 1 | 2 | 3 |
hi | 6 | 4 | 5 |
Any advise will be very helpful! Thank you!
CodePudding user response:
df.apply(lambda x: pd.Series(x['Column 3'].split(','), index=x['Column2'].split(',')), axis=1)
output:
a b c
0 1 2 3
1 4 5 6
result make to df1
and concat
df1 = df.apply(lambda x: pd.Series(x['Column 3'].split(','), index=x['Column2'].split(',')), axis=1)
pd.concat([df['Column 1'], df1], axis=1)
output:
col1 a b c
0 hello 1 2 3
1 hi 4 5 6
CodePudding user response:
You can use pd.crosstab
after exploding
the commas:
new_df = ( df.assign(t=df['Column 2'].str.split(','), a=df['Column 3'].str.split(',')).
explode(['t', 'a']) )
output = ( pd.crosstab(index=new_df['Column 1'], columns=new_df['t'],
values=new_df['a'], aggfunc='sum').reset_index() )
Output:
t Column 1 a b c
0 hello 1 2 3
1 hi 4 5 6
CodePudding user response:
Efficiency wise, I'd say do all the wrangling in vanilla python and create a new dataframe:
from collections import defaultdict
outcome = defaultdict(list)
for column, row in zip(df['Column 2'], df['Column 3']):
column = column.split(',')
row = row.split(',')
for first, last in zip(column, row):
outcome[first].append(last)
pd.DataFrame(outcome).assign(Column = df['Column 1'])
a b c Column
0 1 2 3 hello
1 6 4 5 hi