Home > Back-end >  Merge columns with more than one value in pandas dataframe
Merge columns with more than one value in pandas dataframe

Time:11-17

I've got this DataFrame in Python using pandas:

Column 1 Column 2 Column 3
hello a,b,c 1,2,3
hi b,c,a 4,5,6

The values in column 3 belong to the categories in column 2. Is there a way to combine columns 2 and 3 that I get this output?

Column 1 a b c
hello 1 2 3
hi 6 4 5

Any advise will be very helpful! Thank you!

CodePudding user response:

df.apply(lambda x: pd.Series(x['Column 3'].split(','), index=x['Column2'].split(',')), axis=1)    

output:

    a   b   c
0   1   2   3
1   4   5   6

result make to df1 and concat

df1 = df.apply(lambda x: pd.Series(x['Column 3'].split(','), index=x['Column2'].split(',')), axis=1)

pd.concat([df['Column 1'], df1], axis=1)

output:

    col1    a   b   c
0   hello   1   2   3
1   hi      4   5   6

CodePudding user response:

You can use pd.crosstab after exploding the commas:

new_df = ( df.assign(t=df['Column 2'].str.split(','), a=df['Column 3'].str.split(',')).
                      explode(['t', 'a']) )

output = ( pd.crosstab(index=new_df['Column 1'], columns=new_df['t'], 
                     values=new_df['a'], aggfunc='sum').reset_index() ) 

Output:

t   Column 1    a   b   c
0   hello       1   2   3
1   hi          4   5   6

CodePudding user response:

Efficiency wise, I'd say do all the wrangling in vanilla python and create a new dataframe:

from collections import defaultdict
outcome = defaultdict(list)
for column, row in zip(df['Column 2'], df['Column 3']):
    column = column.split(',')
    row = row.split(',')
    for first, last in zip(column, row):
        outcome[first].append(last)
pd.DataFrame(outcome).assign(Column = df['Column 1'])
   a  b  c Column
0  1  2  3  hello
1  6  4  5     hi
  • Related