Home > Back-end >  Pandas special pivot dataframe
Pandas special pivot dataframe

Time:12-21

Let's take a sample dataframe :

df = pd.DataFrame({"Name": ["Alan","Alan","Kate","Kate","Brian"],
                   "Shop" :["A","B","C","A","B"],
                   "Amount":[4,2,1,3,5]})
    Name Shop  Amount
0   Alan    A       4
1   Alan    B       2
2   Kate    C       1
3   Kate    A       3
4  Brian    B       5

First expected output :

I would like to create a new dataframe from df having :

  • as columns all the possible values in the column Shop and the column Name
  • as index all the possible values in the column Shop, repeated for each value in column `Name'
  • as values the value in the column Àmount matching with the columns Name and Shop

Expected output :

   A  B  C   Name
A  4  2  0   Alan
B  4  2  0   Alan
C  4  2  0   Alan
A  3  0  1   Kate
B  3  0  1   Kate
C  3  0  1   Kate
A  0  5  0  Brian
B  0  5  0  Brian
C  0  5  0  Brian

Second expected output :

It's almost the same as the first expected output. The only difference is that the value is the one that match with the index (and not column Name) and the column Shop.

Expected output :

   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian

Thanks to this post, I tried several scripts using pivot_table and pivot but I didn't reach my expected outputs. Would you know please how to do ?

CodePudding user response:

First use DataFrame.pivot with repalce missing values to 0 and for same order like original values use DataFrame.reindex:

v = df['Shop'].unique()
df1 = (df.pivot('Name','Shop','Amount')
         .fillna(0)
         .astype('int')
         .reindex(df['Name'].unique()))

For repeat rows use Index.repeat with DataFrame.loc and set repeated indices by numpy.tile:

df = (df1.assign(Name=df1.index)
         .loc[df1.index.repeat(len(v))]
         .set_index(np.tile(v, len(v)))
         .rename_axis(None, axis=1))
    
print (df)
   A  B  C   Name
A  4  2  0   Alan
B  4  2  0   Alan
C  4  2  0   Alan
A  3  0  1   Kate
B  3  0  1   Kate
C  3  0  1   Kate
A  0  5  0  Brian
B  0  5  0  Brian
C  0  5  0  Brian

For second output reshape by DataFrame.stack with numpy.broadcast_to:

s = df1.stack()
df2 = (pd.DataFrame(np.broadcast_to(s.to_numpy()[:, None], (len(s), len(v))),
                   columns=v, index= np.tile(v, len(v)))
         .assign(Name=s.index.get_level_values(0)))

print (df2)
   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian

Alternative solution:

arr = np.ravel(df1)
df2 = (pd.DataFrame(np.broadcast_to(arr[:, None], (len(arr), len(v))),
                   columns=v, index= np.tile(v, len(v)))
          .assign(Name=np.repeat(df1.index, len(v)))
         )

print (df2)
   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian
  • Related