Combine Pandas columns into a nested list-CodePudding

I am attempting to combine elements of a dataframe into a nested list. Say I have the following:

df = pd.DataFrame(np.random.randn(100,4), columns=list('abcd'))
df.head(4)

          a         b         c         d
0  0.455258  1.135895  0.573383 -0.637943
1  0.262079 -0.397168 -0.980062 -1.600837
2  0.921582  0.767232 -0.298590 -0.159964
3 -0.645110 -0.709058  1.223899  0.382212

Then, I would like to create a fifth column e that looks like:

          a         b         c         d         e
0  0.455258  1.135895  0.573383 -0.637943 [[0.455258  1.135895  0.573383 -0.637943]]
1  0.262079 -0.397168 -0.980062 -1.600837 [[0.262079 -0.397168 -0.980062 -1.600837]]
2  0.921582  0.767232 -0.298590 -0.159964 [[0.921582  0.767232 -0.298590 -0.159964]]
3 -0.645110 -0.709058  1.223899  0.382212 [[-0.645110 -0.709058  1.223899  0.382212]]

efficiently.

My most efficient but wrong guess so far has been to do

df['e'] = df.values.tolist()

But that just results in:

          a         b         c         d         e
0  0.455258  1.135895  0.573383 -0.637943 [0.455258  1.135895  0.573383 -0.637943]
1  0.262079 -0.397168 -0.980062 -1.600837 [0.262079 -0.397168 -0.980062 -1.600837]
2  0.921582  0.767232 -0.298590 -0.159964 [0.921582  0.767232 -0.298590 -0.159964]
3 -0.645110 -0.709058  1.223899  0.382212 [-0.645110 -0.709058  1.223899  0.382212]

My least efficient but correct guess has been:

a = []
for index, row in df.iterrows():
    a.append([[row['a'],row['b'],row['c'],row['d']]])

Is there a better way?

CodePudding user response：

try:

df["e"]=df.apply(lambda x:[x[column] for column in df.columns],axis=1)

CodePudding user response：

Another possible solution:

df['e'] = df.values.tolist()
df['e'] = df['e'].map(lambda x: [x])

Output:

          a         b         c         d  \
0 -1.594129  1.692562  0.602186 -1.620295   
1 -0.561567 -0.033658 -1.259215  1.054229   
2  0.450852 -0.483194  0.126173  0.354781   
3  2.060968 -0.428400 -0.973516 -0.201786   
4 -0.977307 -0.123215 -1.494138 -0.175432   

                                                   e  
0  [[-1.5941291794267378, 1.6925620764107292, 0.6...  
1  [[-0.5615669341251519, -0.03365818317800309, -...  
2  [[0.45085184068754164, -0.48319360005444034, 0...  
3  [[2.0609676606685086, -0.42839969840552594, -0...  
4  [[-0.9773067339895964, -0.12321466907036417, -...

CodePudding user response：

Let's use np.array_split:

df['e'] = np.array_split(df.to_numpy(), df.shape[0], axis=0)

Output:

           a         b         c         d                                                  e
0  -0.164745 -0.498313 -0.247778 -1.531003  [[-0.16474534230721335, -0.49831346259483156, ...
1   0.079485  0.125790  0.002755 -0.182361  [[0.0794845071834397, 0.12579014367640728, 0.0...
2   0.790263  0.488152 -0.752555  0.432949  [[0.790263001866772, 0.48815219760288764, -0.7...
3  -0.139499 -1.493593 -1.708668 -2.495497  [[-0.13949904491921675, -1.493593498340277, -1...
4   2.662431  0.247559 -0.949407  2.746299  [[2.662430989009563, 0.2475588133223812, -0.94...
..       ...       ...       ...       ...                                                ...
95  0.252663  1.018614 -0.491736 -0.290786  [[0.252663350866794, 1.018613617727022, -0.491...
96  1.023089 -0.367463  0.437327 -0.017441  [[1.0230888404185123, -0.3674628009130751, 0.4...
97  0.571278  0.450803  0.441102  1.176884  [[0.5712775025212533, 0.4508029251387083, 0.44...
98  1.336477  0.166516  0.408941  0.972896  [[1.3364769455886123, 0.16651649771088423, 0.4...
99 -1.298205  1.868477 -0.174665  0.065565  [[-1.2982050517578514, 1.8684774453090633, -0....