Home > Net >  Convert multiple columns in pandas dataframe to array of arrays
Convert multiple columns in pandas dataframe to array of arrays

Time:02-02

I have the following dataframe:

   col1  col2  col3 
1     1     2     3    
2     4     5     6  
3     7     8     9    
4     10    11    12     

I want to create a new column that will be an array of arrays, that contains a single array consisting of specific columns, casted to float. So given column names, say "col2" and "col3", the output dataframe would look like this.

   col1  col2  col3        new
1     1     2     3    [[2,3]]
2     4     5     6    [[5,6]]
3     7     8     9    [[8,9]]
4     10    11    12   [[11,12]]

What I have so far works, but seems clumsy and I believe there's a better way. I'm fairly new to pandas and numpy.

selected_columns = ["col2", "col3"]
df[selected_columns] = df[selected_columns].astype(float)
df['new'] = df.apply(lambda r: tuple(r[selected_columns]), axis=1) 
              .apply(np.array) 
              .apply(lambda r: tuple(r[["new"]]), axis=1) 
              .apply(np.array)

Appreciate your help, Thanks!

CodePudding user response:

Using agg:

cols = ['col2', 'col3']
df['new'] = df[cols].agg(list, axis=1)

Using :

df['new'] = df[cols].to_numpy().tolist()

Output:

   col1  col2  col3       new
1     1     2     3    [2, 3]
2     4     5     6    [5, 6]
3     7     8     9    [8, 9]
4    10    11    12  [11, 12]

2D lists

cols = ['col2', 'col3']
df['new'] = df[cols].agg(lambda x: [list(x)], axis=1)

# or
df['new'] = df[cols].to_numpy()[:,None].tolist()

Output:

   col1  col2  col3         new
1     1     2     3    [[2, 3]]
2     4     5     6    [[5, 6]]
3     7     8     9    [[8, 9]]
4    10    11    12  [[11, 12]]
  • Related