How to make each row of a DataFrame an array?-CodePudding

If this is my data frame how do I convert it to array for each row?

            3        4        5        6       97       98       99      100
0         1.0      2.0      3.0      4.0     95.0     96.0     97.0     98.0
1     50699.0  16302.0  50700.0  16294.0  50735.0  16334.0  50737.0  16335.0
2     57530.0  33436.0  57531.0  33438.0      NaN      NaN      NaN      NaN
3     24014.0  24015.0  34630.0  24016.0      NaN      NaN      NaN      NaN
4     44933.0   2611.0  44936.0   2612.0  44982.0   2631.0  44972.0   2633.0
1792  46712.0  35340.0  46713.0  35341.0  46759.0  35387.0  46760.0  35388.0
1793  61283.0  40276.0  61284.0  40277.0  61330.0  40323.0  61331.0  40324.0
1794      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0
1795      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0
1796  27156.0  48331.0  27157.0  48332.0      NaN      NaN      NaN      NaN

For example, I want it to be [1.0, 2.0, 3.0, 4.0, 95.0, 96.0, 97.0, 98.0] for the first one.

CodePudding user response：

You can loop the dataframe's rows and assign the NumPy arrays dynamically to the global symbol table dict. To loop rows, you can loop the transposes dataframe's columns.

# sample frame
df = pd.DataFrame({'col1' : [np.nan, 1.0, 4.5, 1.3, np.nan, 6.7],
                   'col2' : [-0.4, 0.5, -2.3, np.nan, 1.2, 2.4]})

# transpose 
df = df.transpose()

# dynamical assignment -> global symbol table
for i in df.columns:
    globals()['v{}'.format(i 1)] = np.array(df[i])

v1
>array([ nan, -0.4])

v2
>array([1. , 0.5])

EDIT: Added `tranpose() to provide rows rather than columns as in the initial answer. Thanks, BeRT2me

CodePudding user response：

>>> import numpy as np
>>> out = df.apply(np.array, axis=1) # df.apply(list, axis=1)
>>> print(out.to_frame('arrays'))
                                                 arrays
0          [1.0, 2.0, 3.0, 4.0, 95.0, 96.0, 97.0, 98.0]
1     [50699.0, 16302.0, 50700.0, 16294.0, 50735.0, ...
2     [57530.0, 33436.0, 57531.0, 33438.0, nan, nan,...
3     [24014.0, 24015.0, 34630.0, 24016.0, nan, nan,...
4     [44933.0, 2611.0, 44936.0, 2612.0, 44982.0, 26...
1792  [46712.0, 35340.0, 46713.0, 35341.0, 46759.0, ...
1793  [61283.0, 40276.0, 61284.0, 40277.0, 61330.0, ...
1794           [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
1795           [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
1796  [27156.0, 48331.0, 27157.0, 48332.0, nan, nan,...

>>> print(df.to_numpy().tolist())
[[1.0, 2.0, 3.0, 4.0, 95.0, 96.0, 97.0, 98.0],
 [50699.0, 16302.0, 50700.0, 16294.0, 50735.0, 16334.0, 50737.0, 16335.0],
 [57530.0, 33436.0, 57531.0, 33438.0, nan, nan, nan, nan],
 [24014.0, 24015.0, 34630.0, 24016.0, nan, nan, nan, nan],
 [44933.0, 2611.0, 44936.0, 2612.0, 44982.0, 2631.0, 44972.0, 2633.0],
 [46712.0, 35340.0, 46713.0, 35341.0, 46759.0, 35387.0, 46760.0, 35388.0],
 [61283.0, 40276.0, 61284.0, 40277.0, 61330.0, 40323.0, 61331.0, 40324.0],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
 [27156.0, 48331.0, 27157.0, 48332.0, nan, nan, nan, nan]]

CodePudding user response：

What about

>>> rows = [*df.to_numpy()]  # list of arrays
>>> rows[0]
array([ 1.,  2.,  3.,  4., 95., 96., 97., 98.])

or since you seem to be using the words list and array interchangeably,

>>> [*rows] = map(list, df.to_numpy())  # list of lists
>>> rows[0]
[1.0, 2.0, 3.0, 4.0, 95.0, 96.0, 97.0, 98.0]