Home > database >  Transform list of ndarrays into dataframe
Transform list of ndarrays into dataframe

Time:05-19

I have a list of ndarrays that I want to transform into a pd.dataframe. The list looks like this :

from numpy import array
l = [array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
            0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
            0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0]),
     array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0]),
     array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])
     ]

The length of the ndarrays is a multiple of 12 (12 months) in this case it's equal to 36. I want the final output to look like this :

Year Jan Feb March April May
1 0 0 0 0 0
2 0 1 1 0 0
3 0 0 1 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
1 0 0 1 0 0
2 1 1 0 0 1
3 0 0 0 0 0

CodePudding user response:

reshaping

Assuming l the input, you can use:

from calendar import month_abbr
df = (pd.DataFrame(np.vstack(l).reshape(-1, 12),
                   columns=month_abbr[1:])
     )
df.insert(0, 'year', np.tile(range(1, len(l[0])//12 1), len(l)))
print(df)

or:

df = pd.DataFrame(np.hstack([np.tile(range(1, len(l[0])//12 1), len(l))[:,None],
                             np.vstack(l).reshape(-1, 12)]),
                  columns=['year'] month_abbr[1:])

output:

   year  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
0     1    0    0    0    0    0    0    0    0    0    0    0    1
1     2    0    1    1    0    0    1    0    0    0    0    0    0
2     3    0    0    1    0    0    0    0    0    0    1    0    0
3     1    0    0    0    0    0    0    0    0    0    0    0    0
4     2    0    0    0    0    0    0    0    0    0    0    0    0
5     3    0    0    0    0    0    0    0    0    0    1    1    0
6     1    0    0    1    0    0    0    0    0    0    0    0    0
7     2    1    1    0    0    1    0    0    0    1    1    0    0
8     3    0    0    0    0    0    0    0    0    0    0    0    1

previous answer: aggregation

Assuming l the input list and that each list represents successive months to form 3 years, you can vstack, aggregate (here using max), and reshape before converting to DataFrame:

from calendar import month_abbr
df = pd.DataFrame(np.vstack(l).reshape(len(l), -1, 12).max(axis=0),
                  columns=month_abbr[1:])

output:

   Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
0    0    0    1    0    0    0    0    0    0    0    0    1
1    1    1    1    0    1    1    0    0    1    1    0    0
2    0    0    1    0    0    0    0    0    0    1    1    1

As it is ambiguous how you want to aggregate, you can also use a different axis:

pd.DataFrame(np.vstack(l).reshape(len(l), -1, 12).max(axis=1),
             columns=month_abbr[1:])

output:

   Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
0    0    1    1    0    0    1    0    0    0    1    0    1
1    0    0    0    0    0    0    0    0    0    1    1    0
2    1    1    1    0    1    0    0    0    1    1    0    1
  • Related