I am using pandas to create three arrays that I need for some stats. I need all the fields, the month and the number of finishes and starts in that month.
My dataframe is the following
month finish started
0 MONTH.Mar 1 0
1 MONTH.Mar 1 0
2 MONTH.Mar 1 0
3 MONTH.Mar 1 0
4 MONTH.Mar 1 0
5 MONTH.Mar 0 1
6 MONTH.Apr 1 0
7 MONTH.Mar 0 1
8 MONTH.Mar 0 1
9 MONTH.Feb 0 1
I do a groupby:
df.groupby('month').sum()
and the output is the following:
finish started
month
MONTH.Apr 1 0
MONTH.Feb 0 1
MONTH.Mar 5 3
How can I convert the data into three different lists like this:
['MONTH.Apr','MONTH.Feb','MONTH.Mar']
[1,0,5]
[0,1,3]
I tried to do frame.values.tolist()
but the output was the following:
[[1, 0], [0, 1], [5, 3]]
and it was impossible to get the months.
Thank you very much
CodePudding user response:
IIUC, try reset_index()
and transposing .T
:
>>> df.groupby('month').sum().reset_index().T.to_numpy()
array([['MONTH.Apr', 'MONTH.Feb', 'MONTH.Mar'],
[1, 0, 5],
[0, 1, 3]], dtype=object)
Or:
>>> df.groupby('month').sum().reset_index().T.values.tolist()
[['MONTH.Apr', 'MONTH.Feb', 'MONTH.Mar'], [1, 0, 5], [0, 1, 3]]
CodePudding user response:
You can use:
month, finish, started = df.groupby('month', as_index=False) \
.sum().to_dict('list').values()
Output:
>>> month
['MONTH.Apr', 'MONTH.Feb', 'MONTH.Mar']
>>> finish
[1, 0, 5]
>>> started
[0, 1, 3]