Preserving order with pandas.crosstab-CodePudding

I have the following csv data:

question,answer
m2020_s,3
m2020_s,3
m2020_s,3
m2020_s,3
m2020_s,3
m2020_s,3
a2020_k,1
a2020_k,2
a2020_k,1
a2020_k,4
a2020_k,1
a2020_k,1
d2015_a,5
d2015_a,4
d2015_a,4
d2015_a,4
d2015_a,4
d2015_a,4

I'm using pd.crosstab to count the number of times each answer was given but the function is messing with the order of my data. Here is my code:

import pandas as pd

df = pd.read_csv('example.csv')

output_array = pd.crosstab(df['question'], df['answer']).to_numpy()

print(output_array)

Expected result:

[[0 0 6 0 0]
 [4 1 0 1 0]
 [0 0 0 5 1]]

Actual result:

[[4 1 0 1 0]
 [0 0 0 5 1]
 [0 0 6 0 0]]

Why is this happening? And how can I preserve the data's order?

CodePudding user response：

Could you try this,

pd.crosstab(df['question'], df['answer']).reindex(df['question'].unique()).to_numpy()

O/P:

array([[0, 0, 6, 0, 0],
       [4, 1, 0, 1, 0],
       [0, 0, 0, 5, 1]], dtype=int64)

Explanation: Reorder index based on unique elements in your dataset based on first occurance.