I have a list of ndarrays that I want to transform into a pd.dataframe. The list looks like this :
from numpy import array
l = [array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0]),
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0]),
array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])
]
The length of the ndarrays is a multiple of 12 (12 months) in this case it's equal to 36. I want the final output to look like this :
Year | Jan | Feb | March | April | May |
---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 1 | 1 | 0 | 0 |
3 | 0 | 0 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 1 | 0 | 0 |
2 | 1 | 1 | 0 | 0 | 1 |
3 | 0 | 0 | 0 | 0 | 0 |
CodePudding user response:
reshaping
Assuming l
the input, you can use:
from calendar import month_abbr
df = (pd.DataFrame(np.vstack(l).reshape(-1, 12),
columns=month_abbr[1:])
)
df.insert(0, 'year', np.tile(range(1, len(l[0])//12 1), len(l)))
print(df)
or:
df = pd.DataFrame(np.hstack([np.tile(range(1, len(l[0])//12 1), len(l))[:,None],
np.vstack(l).reshape(-1, 12)]),
columns=['year'] month_abbr[1:])
output:
year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0 1 0 0 0 0 0 0 0 0 0 0 0 1
1 2 0 1 1 0 0 1 0 0 0 0 0 0
2 3 0 0 1 0 0 0 0 0 0 1 0 0
3 1 0 0 0 0 0 0 0 0 0 0 0 0
4 2 0 0 0 0 0 0 0 0 0 0 0 0
5 3 0 0 0 0 0 0 0 0 0 1 1 0
6 1 0 0 1 0 0 0 0 0 0 0 0 0
7 2 1 1 0 0 1 0 0 0 1 1 0 0
8 3 0 0 0 0 0 0 0 0 0 0 0 1
previous answer: aggregation
Assuming l
the input list and that each list represents successive months to form 3 years, you can vstack
, aggregate (here using max
), and reshape before converting to DataFrame:
from calendar import month_abbr
df = pd.DataFrame(np.vstack(l).reshape(len(l), -1, 12).max(axis=0),
columns=month_abbr[1:])
output:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0 0 0 1 0 0 0 0 0 0 0 0 1
1 1 1 1 0 1 1 0 0 1 1 0 0
2 0 0 1 0 0 0 0 0 0 1 1 1
As it is ambiguous how you want to aggregate, you can also use a different axis:
pd.DataFrame(np.vstack(l).reshape(len(l), -1, 12).max(axis=1),
columns=month_abbr[1:])
output:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0 0 1 1 0 0 1 0 0 0 1 0 1
1 0 0 0 0 0 0 0 0 0 1 1 0
2 1 1 1 0 1 0 0 0 1 1 0 1