Let's take a sample dataframe :
df = pd.DataFrame({"Name": ["Alan","Alan","Kate","Kate","Brian"],
"Shop" :["A","B","C","A","B"],
"Amount":[4,2,1,3,5]})
Name Shop Amount
0 Alan A 4
1 Alan B 2
2 Kate C 1
3 Kate A 3
4 Brian B 5
First expected output :
I would like to create a new dataframe from df having :
- as columns all the possible values in the column
Shop
and the columnName
- as index all the possible values in the column
Shop
, repeated for each value in column `Name' - as values the value in the column
Àmount
matching with the columnsName
andShop
Expected output :
A B C Name
A 4 2 0 Alan
B 4 2 0 Alan
C 4 2 0 Alan
A 3 0 1 Kate
B 3 0 1 Kate
C 3 0 1 Kate
A 0 5 0 Brian
B 0 5 0 Brian
C 0 5 0 Brian
Second expected output :
It's almost the same as the first expected output. The only difference is that the value is the one that match with the index (and not column Name
) and the column Shop
.
Expected output :
A B C Name
A 4 4 4 Alan
B 2 2 2 Alan
C 0 0 0 Alan
A 3 3 3 Kate
B 0 0 0 Kate
C 1 1 1 Kate
A 0 0 0 Brian
B 5 5 5 Brian
C 0 0 0 Brian
Thanks to this post, I tried several scripts using pivot_table and pivot but I didn't reach my expected outputs. Would you know please how to do ?
CodePudding user response:
First use DataFrame.pivot
with repalce missing values to 0
and for same order like original values use DataFrame.reindex
:
v = df['Shop'].unique()
df1 = (df.pivot('Name','Shop','Amount')
.fillna(0)
.astype('int')
.reindex(df['Name'].unique()))
For repeat rows use Index.repeat
with DataFrame.loc
and set repeated indices by numpy.tile
:
df = (df1.assign(Name=df1.index)
.loc[df1.index.repeat(len(v))]
.set_index(np.tile(v, len(v)))
.rename_axis(None, axis=1))
print (df)
A B C Name
A 4 2 0 Alan
B 4 2 0 Alan
C 4 2 0 Alan
A 3 0 1 Kate
B 3 0 1 Kate
C 3 0 1 Kate
A 0 5 0 Brian
B 0 5 0 Brian
C 0 5 0 Brian
For second output reshape by DataFrame.stack
with numpy.broadcast_to
:
s = df1.stack()
df2 = (pd.DataFrame(np.broadcast_to(s.to_numpy()[:, None], (len(s), len(v))),
columns=v, index= np.tile(v, len(v)))
.assign(Name=s.index.get_level_values(0)))
print (df2)
A B C Name
A 4 4 4 Alan
B 2 2 2 Alan
C 0 0 0 Alan
A 3 3 3 Kate
B 0 0 0 Kate
C 1 1 1 Kate
A 0 0 0 Brian
B 5 5 5 Brian
C 0 0 0 Brian
Alternative solution:
arr = np.ravel(df1)
df2 = (pd.DataFrame(np.broadcast_to(arr[:, None], (len(arr), len(v))),
columns=v, index= np.tile(v, len(v)))
.assign(Name=np.repeat(df1.index, len(v)))
)
print (df2)
A B C Name
A 4 4 4 Alan
B 2 2 2 Alan
C 0 0 0 Alan
A 3 3 3 Kate
B 0 0 0 Kate
C 1 1 1 Kate
A 0 0 0 Brian
B 5 5 5 Brian
C 0 0 0 Brian