Home > Enterprise >  Pandas - convert every row to a single column
Pandas - convert every row to a single column

Time:10-10

I have the following data frame.

                                profile_id         companyName                                  title
0  ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg  Project A Ventures                                    CMO
1  ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg  Project A Ventures           Head of Customer Acquisition
2  ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg            Facebook  Client Solutions Manager (Disruptors)
3  ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg  Project A Ventures       Senior Display Marketing Manager
4  ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg  Project A Ventures              Display Marketing Manager

My goal is to convert every row to a single column like this. Basically, I have 5 companies and 5 titles for one profile id and I want to have a single row per profile id. enter image description here

Edit: The number of companies/titles can differ per profile.

I've tried pivoting but it doesn't seem to work for this case.

Any help is appreciated.

CodePudding user response:

One solution could be as follows:

Setup

import pandas as pd

data = {'profile_id': [1, 1, 1, 2, 2],
        'companyName': [*'ABCAB'],
        'title': [*'XYZYZ']}

df = pd.DataFrame(data)

print(df)

   profile_id companyName title
0           1           A     X
1           1           B     Y
2           1           C     Z
3           2           A     Y
4           2           B     Z

Pivot

  • Use df.pivot_table with aggfunc=list to get each profile_id in a single row with the values for companyName and title as lists.
  • Use the constructor df.DataFrame on both columns (with .tolist()). This will expand the single columns into multiple columns with the individual items from the lists. Wrap this inside pd.concat to get one df again.
  • At this stage, we'll have res.columns as Int64Index([0, 1, 2, 0, 1, 2], dtype='int64'). I.e. 0,1,2 cols for companyName, and same for title. We are looking for 0,0,1,1,2,2, so let's apply res = res.loc[:, res.columns.unique()].
  • Finally, use itertools.cycle to rename all the columns.
tmp = df.pivot_table(index='profile_id',values=['companyName','title'],aggfunc=list)
res = pd.concat([pd.DataFrame(tmp[col].tolist(), 
                              index=tmp.index)
                 for col in tmp.columns], axis=1)

res = res.loc[:, res.columns.unique()]

from itertools import cycle
res.columns = [f'{j}{int(i) 1}' for i, j in zip(res.columns, cycle(tmp.columns))]

print(res)

           companyName1 title1 companyName2 title2 companyName3 title3
profile_id                                                            
1                     A      X            B      Y            C      Z
2                     A      Y            B      Z         None   None
  • Related