Home > other >  Pandas function to create multiple columns based on a single column
Pandas function to create multiple columns based on a single column

Time:06-28

I have a dataset which looks like this

A B C
A1 1 1.3
A1 2 1.4
A1 3 1.3
A2 1 1.7
A2 2 1.6
A2 3 1.5
A1 1 1
A1 2 2.7
A1 3 2.4
A2 1 1.6
A2 2 1.7
A2 3 1.2

Desired output :

A 1 2 3
A1 1.3 1.4 1.3
A1 1 2.7 2.4
A2 1.7 1.6 1.5
A2 1.6 1.7 1.2

I have tried a pivot table but it's giving aggregation and without aggregation pivot it is showing a duplicate index error because of the duplicate index. What is the best way to handle this?

CodePudding user response:

You can iterate over using number of elements (assuming all combos have same number of elements) as follows:

dfx = df.pivot_table(index='A', columns='B', values='C', aggfunc=list)
pd.concat([dfx.applymap(lambda x:x[idx]) for idx in range(dfx.apply(lambda x: len(x)).max())])
A 1 2 3
A1 1.3 1.4 1.3
A2 1.7 1.6 1.5
A1 1 2.7 2.4
A2 1.6 1.7 1.2

Old answer:

Seeing your sample dataframe, a simple pivot should do:

df.pivot_table(index='A', columns='B', values='C', aggfunc='mean')

CodePudding user response:

Try:

df = df.pivot_table(index="A", columns="B", values="C", aggfunc=list)
df = df.explode(df.columns.to_list()).reset_index()
df.columns.name = None
print(df)

Prints:

    A    1    2    3
0  A1  1.3  1.4  1.3
1  A1  1.0  2.7  2.4
2  A2  1.7  1.6  1.5
3  A2  1.6  1.7  1.2

CodePudding user response:

One option is to create an extra column that introduces uniqueness to your data, before pivoting:

a = df.groupby('A').ngroup().diff().fillna(0, downcast = 'infer')
a = a.mask(a < 0, 1).cumsum()
(df
.assign(temp = a)
.pivot(['temp', 'A'], 'B', 'C')
.droplevel(0)
.rename_axis(columns = None)
)

      1    2    3
A
A1  1.3  1.4  1.3
A2  1.7  1.6  1.5
A1  1.0  2.7  2.4
A2  1.6  1.7  1.2
  • Related