customer| current_state | year | amount
ax111 | A | 3 | 300
ax112 | D | 4 | 4890
ax113 | G | 9 | 624
I have a dataframe, I need to extract the customers data into an array list, where the current state is the position where the amount needs to be placed.There are 7 states (A-G)
example output
([
[300,0,0,0,0,0,0],
[0,0,0,4890,0,0,0],
[0,0,0,0,0,0,624],
])
I also need to extract year in a 1d array
year=[3,4,9]
CodePudding user response:
Create MultiIndex
by DataFrame.set_index
with Series.unstack
for reshape and then add missing states by DataFrame.reindex
:
states = list('ABCDEFG')
df1 = (df.set_index('current_state', append=True)['amount']
.unstack(fill_value=0)
.reindex(states, axis=1, fill_value=0))
print (df1)
current_state A B C D E F G
0 300 0 0 0 0 0 0
1 0 0 0 4890 0 0 0
2 0 0 0 0 0 0 624
If logic is different and need pivoting per customers by DataFrame.pivot_table
with aggregate sum
if possible duplicated custome, current_state
rows:
df1 = (df.pivot_table(index='customer',
columns='current_state',
values='amount',
fill_value=0,
aggfunc='sum')
.reindex(states, axis=1, fill_value=0))
print (df1)
current_state A B C D E F G
customer
ax111 300 0 0 0 0 0 0
ax112 0 0 0 4890 0 0 0
ax113 0 0 0 0 0 0 624
Then convert to lists:
L = df1.to_numpy().tolist()
print (L)
[[300, 0, 0, 0, 0, 0, 0], [0, 0, 0, 4890, 0, 0, 0], [0, 0, 0, 0, 0, 0, 624]]
year = df['year'].tolist()
print (year)
[3, 4, 9]