I've been trying to convert a numpy rec.array into a dataframe. The current array looks like:
[rec.array([([0.2], [ 1.76405235, 0.40015721, 0.97873798, 2.2408932 ]),
([0.2], [ 1.86755799, -0.97727788, 0.95008842, -0.15135721]),
([0.2], [-0.10321885, 0.4105985 , 0.14404357, 1.45427351]),
([0.2], [ 0.76103773, 0.12167502, 0.44386323, 0.33367433]),
([0.2], [ 1.49407907, -0.20515826, 0.3130677 , -0.85409574])],
dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
rec.array([([0.1], [ 1.76405235, 0.40015721, 0.97873798, 2.2408932 ]),
([0.1], [ 1.86755799, -0.97727788, 0.95008842, -0.15135721]),
([0.1], [-0.10321885, 0.4105985 , 0.14404357, 1.45427351]),
([0.1], [ 0.76103773, 0.12167502, 0.44386323, 0.33367433]),
([0.1], [ 1.49407907, -0.20515826, 0.3130677 , -0.85409574]),
([0.1], [-2.55298982, 0.6536186 , 0.8644362 , -0.74216502]),
([0.1], [ 2.26975462, -1.45436567, 0.04575852, -0.18718385]),
([0.1], [ 1.53277921, 1.46935877, 0.15494743, 0.37816252]),
([0.1], [-0.88778575, -1.98079647, -0.34791215, 0.15634897]),
([0.1], [ 1.23029068, 1.20237985, -0.38732682, -0.30230275])],
dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
rec.array([([0.16666667], [ 1.76405235, 0.40015721, 0.97873798, 2.2408932 ]),
([0.16666667], [ 1.86755799, -0.97727788, 0.95008842, -0.15135721]),
([0.16666667], [-0.10321885, 0.4105985 , 0.14404357, 1.45427351]),
([0.16666667], [ 0.76103773, 0.12167502, 0.44386323, 0.33367433]),
([0.16666667], [ 1.49407907, -0.20515826, 0.3130677 , -0.85409574]),
([0.16666667], [-2.55298982, 0.6536186 , 0.8644362 , -0.74216502])],
dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
rec.array([([0.05882353], [ 1.76405235, 0.40015721, 0.97873798, 2.2408932 ]),
([0.05882353], [ 1.86755799, -0.97727788, 0.95008842, -0.15135721]),
([0.05882353], [-0.10321885, 0.4105985 , 0.14404357, 1.45427351]),
([0.05882353], [ 0.76103773, 0.12167502, 0.44386323, 0.33367433]),
([0.05882353], [ 1.49407907, -0.20515826, 0.3130677 , -0.85409574]),
([0.05882353], [-2.55298982, 0.6536186 , 0.8644362 , -0.74216502]),
([0.05882353], [ 2.26975462, -1.45436567, 0.04575852, -0.18718385]),
([0.05882353], [ 1.53277921, 1.46935877, 0.15494743, 0.37816252]),
([0.05882353], [-0.88778575, -1.98079647, -0.34791215, 0.15634897]),
([0.05882353], [ 1.23029068, 1.20237985, -0.38732682, -0.30230275]),
([0.05882353], [-1.04855297, -1.42001794, -1.70627019, 1.9507754 ]),
([0.05882353], [-0.50965218, -0.4380743 , -1.25279536, 0.77749036]),
([0.05882353], [-1.61389785, -0.21274028, -0.89546656, 0.3869025 ]),
([0.05882353], [-0.51080514, -1.18063218, -0.02818223, 0.42833187]),
([0.05882353], [ 0.06651722, 0.3024719 , -0.63432209, -0.36274117]),
([0.05882353], [-0.67246045, -0.35955316, -0.81314628, -1.7262826 ]),
([0.05882353], [ 0.17742614, -0.40178094, -1.63019835, 0.46278226])]],
dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))])]
The result should be a five-column dataframe like the following:
Weights | v_1 | v_2 | v_3 | v_4 |
---|---|---|---|---|
0.2 | 1.76405235 | 0.40015721 | 0.97873798 | 2.2408932 |
0.2 | 1.86755799 | -0.97727788 | 0.95008842 | -0.15135721 |
.... | .... | ... | ... | ... |
0.05882353 | 0.17742614 | -0.40178094 | -1.63019835 | 0.46278226 |
and so on..
However, as I do pd.DataFrame(my_list)
, the resulting dataframe has like 90 columns and not 5 as the above. Each column represents a sublist of the array of the form [a], [w, x, y, z]. The resulting dataframe should be: 5 columns and number of rows equal to 32 (for the above example).
CodePudding user response:
I assume your recarray
is stored in a variable called data
. You can convert the array to dataframe using pd.DataFrame
and pd.concat
. Then you can use pandas.DataFrame.pop
to drop the array of lists and pandas.DataFrame.explode
to convert column containing list to data in multiple columns.
Reading Data
df = pd.DataFrame()
for record in data:
temp_df = pd.DataFrame(record.tolist())
df = pd.concat([df, temp_df])
Pre-processing and Unraveling data
df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist(), index= df.index)
df['weights'] = df.pop(0).explode()
df.pop(1)
Output :
This gives us the expected output :
v_1 v_2 v_3 v_4 weights
0 1.764052 0.400157 0.978738 2.240893 0.2
1 1.867558 -0.977278 0.950088 -0.151357 0.2
2 -0.103219 0.410598 0.144044 1.454274 0.2
3 0.761038 0.121675 0.443863 0.333674 0.2
4 1.494079 -0.205158 0.313068 -0.854096 0.2
5 1.764052 0.400157 0.978738 2.240893 0.1
6 1.867558 -0.977278 0.950088 -0.151357 0.1
7 -0.103219 0.410598 0.144044 1.454274 0.1
8 0.761038 0.121675 0.443863 0.333674 0.1
9 1.494079 -0.205158 0.313068 -0.854096 0.1
10 -2.552990 0.653619 0.864436 -0.742165 0.1
11 2.269755 -1.454366 0.045759 -0.187184 0.1
12 1.532779 1.469359 0.154947 0.378163 0.1
13 -0.887786 -1.980796 -0.347912 0.156349 0.1
14 1.230291 1.202380 -0.387327 -0.302303 0.1
15 1.764052 0.400157 0.978738 2.240893 0.166667
16 1.867558 -0.977278 0.950088 -0.151357 0.166667
17 -0.103219 0.410598 0.144044 1.454274 0.166667
18 0.761038 0.121675 0.443863 0.333674 0.166667
19 1.494079 -0.205158 0.313068 -0.854096 0.166667
20 -2.552990 0.653619 0.864436 -0.742165 0.166667
21 1.764052 0.400157 0.978738 2.240893 0.058824
22 1.867558 -0.977278 0.950088 -0.151357 0.058824
23 -0.103219 0.410598 0.144044 1.454274 0.058824
24 0.761038 0.121675 0.443863 0.333674 0.058824
25 1.494079 -0.205158 0.313068 -0.854096 0.058824
26 -2.552990 0.653619 0.864436 -0.742165 0.058824
27 2.269755 -1.454366 0.045759 -0.187184 0.058824
28 1.532779 1.469359 0.154947 0.378163 0.058824
29 -0.887786 -1.980796 -0.347912 0.156349 0.058824
30 1.230291 1.202380 -0.387327 -0.302303 0.058824
31 -1.048553 -1.420018 -1.706270 1.950775 0.058824
32 -0.509652 -0.438074 -1.252795 0.777490 0.058824
33 -1.613898 -0.212740 -0.895467 0.386902 0.058824
34 -0.510805 -1.180632 -0.028182 0.428332 0.058824
35 0.066517 0.302472 -0.634322 -0.362741 0.058824
36 -0.672460 -0.359553 -0.813146 -1.726283 0.058824
37 0.177426 -0.401781 -1.630198 0.462782 0.058824
Alternatively
The same thing can be done using np.hstack
as well, where data is the list of your recarray.
df = pd.DataFrame(np.hstack(data).tolist())
df['weights'] = df[0].explode()
df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist())
df.drop([0, 1], inplace=True, axis=1)
Output
This gives us the same output
weights v_1 v_2 v_3 v_4
0 0.2 1.764052 0.400157 0.978738 2.240893
1 0.2 1.867558 -0.977278 0.950088 -0.151357
2 0.2 -0.103219 0.410598 0.144044 1.454274
3 0.2 0.761038 0.121675 0.443863 0.333674
4 0.2 1.494079 -0.205158 0.313068 -0.854096
5 0.1 1.764052 0.400157 0.978738 2.240893
6 0.1 1.867558 -0.977278 0.950088 -0.151357
7 0.1 -0.103219 0.410598 0.144044 1.454274
8 0.1 0.761038 0.121675 0.443863 0.333674
9 0.1 1.494079 -0.205158 0.313068 -0.854096
10 0.1 -2.552990 0.653619 0.864436 -0.742165
11 0.1 2.269755 -1.454366 0.045759 -0.187184
12 0.1 1.532779 1.469359 0.154947 0.378163
13 0.1 -0.887786 -1.980796 -0.347912 0.156349
14 0.1 1.230291 1.202380 -0.387327 -0.302303
15 0.166667 1.764052 0.400157 0.978738 2.240893
16 0.166667 1.867558 -0.977278 0.950088 -0.151357
17 0.166667 -0.103219 0.410598 0.144044 1.454274
18 0.166667 0.761038 0.121675 0.443863 0.333674
19 0.166667 1.494079 -0.205158 0.313068 -0.854096
20 0.166667 -2.552990 0.653619 0.864436 -0.742165
21 0.058824 1.764052 0.400157 0.978738 2.240893
22 0.058824 1.867558 -0.977278 0.950088 -0.151357
23 0.058824 -0.103219 0.410598 0.144044 1.454274
24 0.058824 0.761038 0.121675 0.443863 0.333674
25 0.058824 1.494079 -0.205158 0.313068 -0.854096
26 0.058824 -2.552990 0.653619 0.864436 -0.742165
27 0.058824 2.269755 -1.454366 0.045759 -0.187184
28 0.058824 1.532779 1.469359 0.154947 0.378163
29 0.058824 -0.887786 -1.980796 -0.347912 0.156349
30 0.058824 1.230291 1.202380 -0.387327 -0.302303
31 0.058824 -1.048553 -1.420018 -1.706270 1.950775
32 0.058824 -0.509652 -0.438074 -1.252795 0.777490
33 0.058824 -1.613898 -0.212740 -0.895467 0.386902
34 0.058824 -0.510805 -1.180632 -0.028182 0.428332
35 0.058824 0.066517 0.302472 -0.634322 -0.362741
36 0.058824 -0.672460 -0.359553 -0.813146 -1.726283
37 0.058824 0.177426 -0.401781 -1.630198 0.462782