I have the following data frame.
profile_id companyName title
0 ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg Project A Ventures CMO
1 ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg Project A Ventures Head of Customer Acquisition
2 ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg Facebook Client Solutions Manager (Disruptors)
3 ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg Project A Ventures Senior Display Marketing Manager
4 ACoAABTDOXgBMlsNG44iUdoDo0lCc1HcheRnMfg Project A Ventures Display Marketing Manager
My goal is to convert every row to a single column like this. Basically, I have 5 companies and 5 titles for one profile id and I want to have a single row per profile id.
Edit: The number of companies/titles can differ per profile.
I've tried pivoting but it doesn't seem to work for this case.
Any help is appreciated.
CodePudding user response:
One solution could be as follows:
Setup
import pandas as pd
data = {'profile_id': [1, 1, 1, 2, 2],
'companyName': [*'ABCAB'],
'title': [*'XYZYZ']}
df = pd.DataFrame(data)
print(df)
profile_id companyName title
0 1 A X
1 1 B Y
2 1 C Z
3 2 A Y
4 2 B Z
Pivot
- Use
df.pivot_table
withaggfunc=list
to get eachprofile_id
in a single row with the values forcompanyName
andtitle
as lists. - Use the constructor
df.DataFrame
on both columns (with.tolist()
). This will expand the single columns into multiple columns with the individual items from the lists. Wrap this insidepd.concat
to get onedf
again. - At this stage, we'll have
res.columns
asInt64Index([0, 1, 2, 0, 1, 2], dtype='int64')
. I.e.0,1,2
cols forcompanyName
, and same fortitle
. We are looking for0,0,1,1,2,2
, so let's applyres = res.loc[:, res.columns.unique()]
. - Finally, use
itertools.cycle
to rename all the columns.
tmp = df.pivot_table(index='profile_id',values=['companyName','title'],aggfunc=list)
res = pd.concat([pd.DataFrame(tmp[col].tolist(),
index=tmp.index)
for col in tmp.columns], axis=1)
res = res.loc[:, res.columns.unique()]
from itertools import cycle
res.columns = [f'{j}{int(i) 1}' for i, j in zip(res.columns, cycle(tmp.columns))]
print(res)
companyName1 title1 companyName2 title2 companyName3 title3
profile_id
1 A X B Y C Z
2 A Y B Z None None