I have the following dataframe :
CONTACT_ID COMPANY_ID FIRST_NAME SURNAME COMPANY_NAME 1 8 Martin GERARD [email protected] BIG COMPANY 1 87 Martin GERARD [email protected] LITTLE COMPANY 30 12 Jean DUPOND [email protected] MEGA COMPANY
I'd like to use pandas to convert it like so :
CONTACT_ID FIRST_NAME SURNAME COMPANY_ID_1 COMPANY_NAME_1 COMPANY_ID_2 COMPANY_NAME_2 1 Martin GERARD [email protected] 8 BIGCOMPANY 87 LITTLECOMPANY 30 Jean DUPOND [email protected] 12 MEGA COMPANY
One contact can even have more than two companies.
CodePudding user response:
Create list of columns used for create groups, then converte them to index by DataFrame.set_index
and counter by GroupBy.cumcount
, reshape by DataFrame.unstack
with DataFrame.sort_index
, flatten MultiIndex
and last convert levels from list to columns:
cols = ['CONTACT_ID','FIRST_NAME','SURNAME','EMAIL']
df = (df.set_index([*cols, df.groupby(cols).cumcount().add(1)])
.unstack(fill_value='')
.sort_index(axis=1, level=1,sort_remaining=False))
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
CONTACT_ID FIRST_NAME SURNAME EMAIL COMPANY_ID_1 \
0 1 Martin GERARD [email protected] 8
1 30 Jean DUPOND [email protected] 12
COMPANY_NAME_1 COMPANY_ID_2 COMPANY_NAME_2
0 BIG COMPANY 87 LITTLE COMPANY
1 MEGA COMPANY