Home > Software engineering >  Extract multiple arrays from dataframe python
Extract multiple arrays from dataframe python

Time:03-15

customer| current_state | year  | amount
ax111   |   A           |   3   |  300
ax112   |   D           |   4   |  4890
ax113   |   G           |   9   |  624

I have a dataframe, I need to extract the customers data into an array list, where the current state is the position where the amount needs to be placed.There are 7 states (A-G)

example output

([
    [300,0,0,0,0,0,0],
    [0,0,0,4890,0,0,0],
    [0,0,0,0,0,0,624],
])

I also need to extract year in a 1d array

year=[3,4,9]

CodePudding user response:

Create MultiIndex by DataFrame.set_index with Series.unstack for reshape and then add missing states by DataFrame.reindex:

states = list('ABCDEFG')
df1 = (df.set_index('current_state', append=True)['amount']
         .unstack(fill_value=0)
         .reindex(states, axis=1, fill_value=0))
print (df1)

current_state    A  B  C     D  E  F    G
0              300  0  0     0  0  0    0
1                0  0  0  4890  0  0    0
2                0  0  0     0  0  0  624

If logic is different and need pivoting per customers by DataFrame.pivot_table with aggregate sum if possible duplicated custome, current_state rows:

df1 = (df.pivot_table(index='customer',
                      columns='current_state', 
                      values='amount',
                      fill_value=0, 
                      aggfunc='sum')
         .reindex(states, axis=1, fill_value=0))
print (df1)
current_state    A  B  C     D  E  F    G
customer                                 
ax111          300  0  0     0  0  0    0
ax112            0  0  0  4890  0  0    0
ax113            0  0  0     0  0  0  624

Then convert to lists:

L = df1.to_numpy().tolist()
print (L)
[[300, 0, 0, 0, 0, 0, 0], [0, 0, 0, 4890, 0, 0, 0], [0, 0, 0, 0, 0, 0, 624]]

year = df['year'].tolist()
print (year)
[3, 4, 9]
  • Related