I have a problem in python. My table in xlsx. looks like this:
COMPANY | ID NUMBER | FUNCTION | NAME |
---|---|---|---|
A | 123 | director | Smith |
A | 123 | partner | Jones |
A | 123 | secretary | Evans |
B | 456 | partner | Brown |
B | 456 | manager | Wilson |
And I need convert rows which have same ID NUMBER to columns. I need all rows with same id to appear on only one row. Like this:
COMPANY | ID NUMBER | FUNCTION | FUNCTION 2 | FUNCTION 3 | NAME | NAME 2 | NAME 3 |
---|---|---|---|---|---|---|---|
A | 123 | director | partner | secretary | Smith | Jones | Evans |
B | 456 | partner | manager | Brown | Wilson |
CodePudding user response:
First greate helper column g
by GroupBy.cumcount
, pivoting by DataFrame.pivot
and then in list comprehension flatten MultiIndex, last convert Index to columns:
df['g'] = df.groupby(['COMPANY','ID NUMBER']).cumcount()
df = df.pivot(['COMPANY','ID NUMBER'], 'g').fillna('')
df.columns = [f'{a} {b 1}' if b > 0 else a for a, b in df.columns]
df = df.reset_index().rename_axis(None, axis=1)
print (df)
COMPANY ID NUMBER FUNCTION FUNCTION 2 FUNCTION 3 NAME NAME 2 NAME 3
0 A 123 director partner secretary Smith Jones Evans
1 B 456 partner manager Brown Wilson