How can I find and sort in a new dataframe the following
Input:
IdA IdB Value
0 123 ABC_25_01 1
1 456 DEF_25_01 2
2 123 ABC_60_01 3
3 456 DEF_60_01 4
Output
IdA IdB Value25 Value60
0 123 ABC_01 1 3
1 456 DEF_01 2 4
Thank you!
CodePudding user response:
Assuming the values in IdB
are all in the form X_Y_Z
where X_Z
is the final IdB
and Y
is the Column header:
- Use
str.extract
to get the "Category" i.e. the number between the two underscores - User
str.replace
to remove the "Category" from IdB. pivot
to get the required output
df["Category"] = df["IdB"].str.extract("_(.*)_")
df["IdB"] = df["IdB"].str.replace("_(.*)_", "_", regex=True)
output = (df.pivot(["IdA", "IdB"], "Category", "Value")
.add_prefix("Value")
.reset_index())
>>> output
Category IdA IdB Value25 Value60
0 123 ABC_01 1 3
1 456 DEF_01 2 4