Convert rec.array to dataframe-CodePudding

I've been trying to convert a numpy rec.array into a dataframe. The current array looks like:

[rec.array([([0.2], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.2], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.2], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.2], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.2], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.1], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.1], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.1], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.1], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.1], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.1], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502]),
            ([0.1], [ 2.26975462, -1.45436567,  0.04575852, -0.18718385]),
            ([0.1], [ 1.53277921,  1.46935877,  0.15494743,  0.37816252]),
            ([0.1], [-0.88778575, -1.98079647, -0.34791215,  0.15634897]),
            ([0.1], [ 1.23029068,  1.20237985, -0.38732682, -0.30230275])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.16666667], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.16666667], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.16666667], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.16666667], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.16666667], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.16666667], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.05882353], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.05882353], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.05882353], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.05882353], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.05882353], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.05882353], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502]),
            ([0.05882353], [ 2.26975462, -1.45436567,  0.04575852, -0.18718385]),
            ([0.05882353], [ 1.53277921,  1.46935877,  0.15494743,  0.37816252]),
            ([0.05882353], [-0.88778575, -1.98079647, -0.34791215,  0.15634897]),
            ([0.05882353], [ 1.23029068,  1.20237985, -0.38732682, -0.30230275]),
            ([0.05882353], [-1.04855297, -1.42001794, -1.70627019,  1.9507754 ]),
            ([0.05882353], [-0.50965218, -0.4380743 , -1.25279536,  0.77749036]),
            ([0.05882353], [-1.61389785, -0.21274028, -0.89546656,  0.3869025 ]),
            ([0.05882353], [-0.51080514, -1.18063218, -0.02818223,  0.42833187]),
            ([0.05882353], [ 0.06651722,  0.3024719 , -0.63432209, -0.36274117]),
            ([0.05882353], [-0.67246045, -0.35955316, -0.81314628, -1.7262826 ]),
            ([0.05882353], [ 0.17742614, -0.40178094, -1.63019835,  0.46278226])]],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))])]

The result should be a five-column dataframe like the following:

Weights	v_1	v_2	v_3	v_4
0.2	1.76405235	0.40015721	0.97873798	2.2408932
0.2	1.86755799	-0.97727788	0.95008842	-0.15135721
....	....	...	...	...
0.05882353	0.17742614	-0.40178094	-1.63019835	0.46278226

and so on.. However, as I do pd.DataFrame(my_list), the resulting dataframe has like 90 columns and not 5 as the above. Each column represents a sublist of the array of the form [a], [w, x, y, z]. The resulting dataframe should be: 5 columns and number of rows equal to 32 (for the above example).

CodePudding user response：

I assume your recarray is stored in a variable called data. You can convert the array to dataframe using pd.DataFrame and pd.concat. Then you can use pandas.DataFrame.pop to drop the array of lists and pandas.DataFrame.explode to convert column containing list to data in multiple columns.

Reading Data

df = pd.DataFrame()
for record in data:
    temp_df = pd.DataFrame(record.tolist())
    df = pd.concat([df, temp_df])

Pre-processing and Unraveling data

df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist(), index= df.index)
df['weights'] = df.pop(0).explode()
df.pop(1)

Output :

This gives us the expected output :

         v_1       v_2       v_3       v_4   weights
0   1.764052  0.400157  0.978738  2.240893       0.2
1   1.867558 -0.977278  0.950088 -0.151357       0.2
2  -0.103219  0.410598  0.144044  1.454274       0.2
3   0.761038  0.121675  0.443863  0.333674       0.2
4   1.494079 -0.205158  0.313068 -0.854096       0.2
5   1.764052  0.400157  0.978738  2.240893       0.1
6   1.867558 -0.977278  0.950088 -0.151357       0.1
7  -0.103219  0.410598  0.144044  1.454274       0.1
8   0.761038  0.121675  0.443863  0.333674       0.1
9   1.494079 -0.205158  0.313068 -0.854096       0.1
10 -2.552990  0.653619  0.864436 -0.742165       0.1
11  2.269755 -1.454366  0.045759 -0.187184       0.1
12  1.532779  1.469359  0.154947  0.378163       0.1
13 -0.887786 -1.980796 -0.347912  0.156349       0.1
14  1.230291  1.202380 -0.387327 -0.302303       0.1
15  1.764052  0.400157  0.978738  2.240893  0.166667
16  1.867558 -0.977278  0.950088 -0.151357  0.166667
17 -0.103219  0.410598  0.144044  1.454274  0.166667
18  0.761038  0.121675  0.443863  0.333674  0.166667
19  1.494079 -0.205158  0.313068 -0.854096  0.166667
20 -2.552990  0.653619  0.864436 -0.742165  0.166667
21  1.764052  0.400157  0.978738  2.240893  0.058824
22  1.867558 -0.977278  0.950088 -0.151357  0.058824
23 -0.103219  0.410598  0.144044  1.454274  0.058824
24  0.761038  0.121675  0.443863  0.333674  0.058824
25  1.494079 -0.205158  0.313068 -0.854096  0.058824
26 -2.552990  0.653619  0.864436 -0.742165  0.058824
27  2.269755 -1.454366  0.045759 -0.187184  0.058824
28  1.532779  1.469359  0.154947  0.378163  0.058824
29 -0.887786 -1.980796 -0.347912  0.156349  0.058824
30  1.230291  1.202380 -0.387327 -0.302303  0.058824
31 -1.048553 -1.420018 -1.706270  1.950775  0.058824
32 -0.509652 -0.438074 -1.252795  0.777490  0.058824
33 -1.613898 -0.212740 -0.895467  0.386902  0.058824
34 -0.510805 -1.180632 -0.028182  0.428332  0.058824
35  0.066517  0.302472 -0.634322 -0.362741  0.058824
36 -0.672460 -0.359553 -0.813146 -1.726283  0.058824
37  0.177426 -0.401781 -1.630198  0.462782  0.058824

Alternatively

The same thing can be done using np.hstack as well, where data is the list of your recarray.

df = pd.DataFrame(np.hstack(data).tolist())
df['weights'] = df[0].explode()
df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist())
df.drop([0, 1], inplace=True, axis=1)

Output

This gives us the same output

     weights       v_1       v_2       v_3       v_4
0        0.2  1.764052  0.400157  0.978738  2.240893
1        0.2  1.867558 -0.977278  0.950088 -0.151357
2        0.2 -0.103219  0.410598  0.144044  1.454274
3        0.2  0.761038  0.121675  0.443863  0.333674
4        0.2  1.494079 -0.205158  0.313068 -0.854096
5        0.1  1.764052  0.400157  0.978738  2.240893
6        0.1  1.867558 -0.977278  0.950088 -0.151357
7        0.1 -0.103219  0.410598  0.144044  1.454274
8        0.1  0.761038  0.121675  0.443863  0.333674
9        0.1  1.494079 -0.205158  0.313068 -0.854096
10       0.1 -2.552990  0.653619  0.864436 -0.742165
11       0.1  2.269755 -1.454366  0.045759 -0.187184
12       0.1  1.532779  1.469359  0.154947  0.378163
13       0.1 -0.887786 -1.980796 -0.347912  0.156349
14       0.1  1.230291  1.202380 -0.387327 -0.302303
15  0.166667  1.764052  0.400157  0.978738  2.240893
16  0.166667  1.867558 -0.977278  0.950088 -0.151357
17  0.166667 -0.103219  0.410598  0.144044  1.454274
18  0.166667  0.761038  0.121675  0.443863  0.333674
19  0.166667  1.494079 -0.205158  0.313068 -0.854096
20  0.166667 -2.552990  0.653619  0.864436 -0.742165
21  0.058824  1.764052  0.400157  0.978738  2.240893
22  0.058824  1.867558 -0.977278  0.950088 -0.151357
23  0.058824 -0.103219  0.410598  0.144044  1.454274
24  0.058824  0.761038  0.121675  0.443863  0.333674
25  0.058824  1.494079 -0.205158  0.313068 -0.854096
26  0.058824 -2.552990  0.653619  0.864436 -0.742165
27  0.058824  2.269755 -1.454366  0.045759 -0.187184
28  0.058824  1.532779  1.469359  0.154947  0.378163
29  0.058824 -0.887786 -1.980796 -0.347912  0.156349
30  0.058824  1.230291  1.202380 -0.387327 -0.302303
31  0.058824 -1.048553 -1.420018 -1.706270  1.950775
32  0.058824 -0.509652 -0.438074 -1.252795  0.777490
33  0.058824 -1.613898 -0.212740 -0.895467  0.386902
34  0.058824 -0.510805 -1.180632 -0.028182  0.428332
35  0.058824  0.066517  0.302472 -0.634322 -0.362741
36  0.058824 -0.672460 -0.359553 -0.813146 -1.726283
37  0.058824  0.177426 -0.401781 -1.630198  0.462782