I am attempting to combine elements of a dataframe into a nested list. Say I have the following:
df = pd.DataFrame(np.random.randn(100,4), columns=list('abcd'))
df.head(4)
a b c d
0 0.455258 1.135895 0.573383 -0.637943
1 0.262079 -0.397168 -0.980062 -1.600837
2 0.921582 0.767232 -0.298590 -0.159964
3 -0.645110 -0.709058 1.223899 0.382212
Then, I would like to create a fifth column e that looks like:
a b c d e
0 0.455258 1.135895 0.573383 -0.637943 [[0.455258 1.135895 0.573383 -0.637943]]
1 0.262079 -0.397168 -0.980062 -1.600837 [[0.262079 -0.397168 -0.980062 -1.600837]]
2 0.921582 0.767232 -0.298590 -0.159964 [[0.921582 0.767232 -0.298590 -0.159964]]
3 -0.645110 -0.709058 1.223899 0.382212 [[-0.645110 -0.709058 1.223899 0.382212]]
efficiently.
My most efficient but wrong guess so far has been to do
df['e'] = df.values.tolist()
But that just results in:
a b c d e
0 0.455258 1.135895 0.573383 -0.637943 [0.455258 1.135895 0.573383 -0.637943]
1 0.262079 -0.397168 -0.980062 -1.600837 [0.262079 -0.397168 -0.980062 -1.600837]
2 0.921582 0.767232 -0.298590 -0.159964 [0.921582 0.767232 -0.298590 -0.159964]
3 -0.645110 -0.709058 1.223899 0.382212 [-0.645110 -0.709058 1.223899 0.382212]
My least efficient but correct guess has been:
a = []
for index, row in df.iterrows():
a.append([[row['a'],row['b'],row['c'],row['d']]])
Is there a better way?
CodePudding user response:
try:
df["e"]=df.apply(lambda x:[x[column] for column in df.columns],axis=1)
CodePudding user response:
Another possible solution:
df['e'] = df.values.tolist()
df['e'] = df['e'].map(lambda x: [x])
Output:
a b c d \
0 -1.594129 1.692562 0.602186 -1.620295
1 -0.561567 -0.033658 -1.259215 1.054229
2 0.450852 -0.483194 0.126173 0.354781
3 2.060968 -0.428400 -0.973516 -0.201786
4 -0.977307 -0.123215 -1.494138 -0.175432
e
0 [[-1.5941291794267378, 1.6925620764107292, 0.6...
1 [[-0.5615669341251519, -0.03365818317800309, -...
2 [[0.45085184068754164, -0.48319360005444034, 0...
3 [[2.0609676606685086, -0.42839969840552594, -0...
4 [[-0.9773067339895964, -0.12321466907036417, -...
CodePudding user response:
Let's use np.array_split
:
df['e'] = np.array_split(df.to_numpy(), df.shape[0], axis=0)
Output:
a b c d e
0 -0.164745 -0.498313 -0.247778 -1.531003 [[-0.16474534230721335, -0.49831346259483156, ...
1 0.079485 0.125790 0.002755 -0.182361 [[0.0794845071834397, 0.12579014367640728, 0.0...
2 0.790263 0.488152 -0.752555 0.432949 [[0.790263001866772, 0.48815219760288764, -0.7...
3 -0.139499 -1.493593 -1.708668 -2.495497 [[-0.13949904491921675, -1.493593498340277, -1...
4 2.662431 0.247559 -0.949407 2.746299 [[2.662430989009563, 0.2475588133223812, -0.94...
.. ... ... ... ... ...
95 0.252663 1.018614 -0.491736 -0.290786 [[0.252663350866794, 1.018613617727022, -0.491...
96 1.023089 -0.367463 0.437327 -0.017441 [[1.0230888404185123, -0.3674628009130751, 0.4...
97 0.571278 0.450803 0.441102 1.176884 [[0.5712775025212533, 0.4508029251387083, 0.44...
98 1.336477 0.166516 0.408941 0.972896 [[1.3364769455886123, 0.16651649771088423, 0.4...
99 -1.298205 1.868477 -0.174665 0.065565 [[-1.2982050517578514, 1.8684774453090633, -0....