Stacking columns in Pandas-CodePudding

I'm trying to set create a new column on my DataFrame grouping two existing columns

import pandas as pd
import numpy as np

DATA=pd.DataFrame(np.random.randn(5,2), columns=['A', 'B'])
DATA['index']=np.arange(5)
DATA.set_index('index', inplace=True)

The output is something like this

       'A'          'B'
index
0      -0.003635    -0.644897
1      -0.617104    -0.343998
2       1.270503    -0.514588
3      -0.053097    -0.404073
4      -0.056717     1.870671

I would like to have an extra column 'C' that has an np.array with the elements of 'A' and 'B' for the corresponding row. In the real case, 'A' and 'B' are already 1D np.arrays, but of different lengths. I would like to make a longer array with all the elements stacked or concatenated.

Thanks

CodePudding user response：

If columns a and b contains numpy arrays, you could apply hstack across rows:

import pandas as pd
import numpy as np

num_rows = 10
max_arr_size = 3
df = pd.DataFrame({
   "a": [np.random.rand(max_arr_size) for _ in range(num_rows)],
   "b": [np.random.rand(max_arr_size) for _ in range(num_rows)],
})
df["c"] = df.apply(np.hstack, 1)

assert all(row.a.size   row.b.size == row.c.size for _, row in df.iterrows())

CodePudding user response：

DATA['C'] = DATA.apply(lambda x: np.array([x.A, x.B]), axis=1)

pandas requires all rows to be of the same length so the problem of uneven pandas series shouldn't be present