I am struggling with the following issue.
My DF is:
df = pd.DataFrame(
[
['7890-1', '12345N', 'John', 'Intermediate'],
['7890-4', '30909N', 'Greg', 'Intermediate'],
['3300-1', '88117N', 'Mark', 'Advanced'],
['2502-2', '90288N', 'Olivia', 'Elementary'],
['7890-2', '22345N', 'Joe', 'Intermediate'],
['7890-3', '72245N', 'Ana', 'Elementary']
],
columns=['Id', 'Code', 'Person', 'Level'])
print(df)
I would like to get such a result:
Id | Code 1 | Person 1 | Level 1 | Code 2 | Person 2 | Level 2 | Code 3 | Person 3 | Level 3 | Code 4 | Person 4 | Level 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7890 | 12345N | John | Intermediate | 22345N | Joe | Intermediate | 72245N | Ana | Elementary | 30909N | Greg | Intermediate |
1 | 3300 | 88117N | Mark | Advanced | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 2502 | NaN | NaN | NaN | 90288N | Olivia | Elementary | NaN | NaN | NaN | NaN | NaN | NaN |
CodePudding user response:
Try:
df[["Id", "Id2"]] = df["Id"].str.split("-", expand=True)
x = df.set_index(["Id", "Id2"]).unstack(level=1)
x.columns = [f"{a} {b}" for a, b in x.columns]
print(
x[sorted(x.columns, key=lambda k: int(k.split()[-1]))]
.reset_index()
.to_markdown()
)
Prints:
Id | Code 1 | Person 1 | Level 1 | Code 2 | Person 2 | Level 2 | Code 3 | Person 3 | Level 3 | Code 4 | Person 4 | Level 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2502 | nan | nan | nan | 90288N | Olivia | Elementary | nan | nan | nan | nan | nan | nan |
1 | 3300 | 88117N | Mark | Advanced | nan | nan | nan | nan | nan | nan | nan | nan | nan |
2 | 7890 | 12345N | John | Intermediate | 22345N | Joe | Intermediate | 72245N | Ana | Elementary | 30909N | Greg | Intermediate |
CodePudding user response:
I'd start with the same approach as @Andrej Kesely but then sort by index after unstacking and map
over the column names with ' '.join
.
df[["Id", "No"]] = df["Id"].str.split("-", expand=True)
df_wide = df.set_index(["Id", "No"]).unstack(level=1).sort_index(axis=1,level=1)
df_wide.columns = df_wide.columns.map(' '.join)
Output
Code 1 Level 1 Person 1 Code 2 Level 2 Person 2 Code 3 \
Id
2502 NaN NaN NaN 90288N Elementary Olivia NaN
3300 88117N Advanced Mark NaN NaN NaN NaN
7890 12345N Intermediate John 22345N Intermediate Joe 72245N
Level 3 Person 3 Code 4 Level 4 Person 4
Id
2502 NaN NaN NaN NaN NaN
3300 NaN NaN NaN NaN NaN
7890 Elementary Ana 30909N Intermediate Greg