I have a dataframe like follows:
A B C
a1 b1 c1
a1 b2 c2
a2 b1 c3
a2 b2 c4
I want to convert it to something like:
A B1 B2
a1 c1 c2
a1 c1 c2
a2 c3 c4
a2 c3 c4
The values in B which are repeated are column names whose values are in C column. I want them to be columns of the dataframe
In fact the dataset is created by flattening a tree, there are more columns each of which is an internal node. the first column is root, C are leaves
These are some tries by me:
cmd = "b1,b2"
cond = ""
for c in cmd.split(","):
cond = "|" f" (df['B'] == '{c}') "
cols = [c] cols
new_col = main_df[main_df['B'] == c]['C']
df[c] = new_col
df = df[eval(cond)]
However, it doesn't work.
CodePudding user response:
Try:
df.pivot('A', columns='B')
You will get only two rows, but your four rows are 2 rows duplicated anyhow.
A C
B b1 b2
0 a1 c1 c2
1 a2 c3 c4
CodePudding user response:
df.pivot can be a solution for such cases. However, I wanted to add my desired columns and so I have this solution:
df = df[df["B"] == "b1"]
df = df.reset_index(drop=True)
for c in desired_cols.split(","):
new_col = main_df[main_df['B'] == c]['C']
new_col = new_col.reset_index(drop=True)
df[c] = new_col