How do I add a list to a column in pandas?-CodePudding

I'm trying to merge the columns kw1, kw2, kw3 shown here:

and have it in one separate column called keywords. This is what I tried:

df['keywords'] = list((df['kw1'], df['kw2'], df['kw3']))
df

but I'm getting this error:

ValueError                                Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 df['keywords'] = list((df['kw1'], df['kw2'], df['kw3']))
      2 df

File /lib/python3.10/site-packages/pandas/core/frame.py:3655, in DataFrame.__setitem__(self, key, value)
   3652     self._setitem_array([key], value)
   3653 else:
   3654     # set column
-> 3655     self._set_item(key, value)

File /lib/python3.10/site-packages/pandas/core/frame.py:3832, in DataFrame._set_item(self, key, value)
   3822 def _set_item(self, key, value) -> None:
   3823     """
   3824     Add series to DataFrame in specified column.
   3825 
   (...)
   3830     ensure homogeneity.
   3831     """
-> 3832     value = self._sanitize_column(value)
   3834     if (
   3835         key in self.columns
   3836         and value.ndim == 1
   3837         and not is_extension_array_dtype(value)
   3838     ):
   3839         # broadcast across multiple columns if necessary
   3840         if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File /lib/python3.10/site-packages/pandas/core/frame.py:4535, in DataFrame._sanitize_column(self, value)
   4532     return _reindex_for_setitem(value, self.index)
   4534 if is_list_like(value):
-> 4535     com.require_length_match(value, self.index)
   4536 return sanitize_array(value, self.index, copy=True, allow_2d=True)

File /lib/python3.10/site-packages/pandas/core/common.py:557, in require_length_match(data, index)
    553 """
    554 Check the length of data matches the length of the index.
    555 """
    556 if len(data) != len(index):
--> 557     raise ValueError(
    558         "Length of values "
    559         f"({len(data)}) "
    560         "does not match length of index "
    561         f"({len(index)})"
    562     )

ValueError: Length of values (3) does not match length of index (141)

Is there a way to make it so that it turns it into a list like this [{value of kw1}, {value of kw2}, {value of kw3}]

CodePudding user response：

You can do it like this

df['keywords'] = np.stack([df['kw1'], df['kw2'], df['kw3']], axis=1).tolist()

Pandas treats each element in the outermost list as a single value, so it complains that you only has three values (which are your three series) while you need 141 values for a new column since your original frame has 141 rows.

Stacking the underlying numpy arrays of the three series on the last dimension gives you a shape (141,3) and converting them to list gives you a list of length 141, with each element being another list of length 3.

A more concise way is to extract three columns as another df and let pandas do the stacking for you

df['keywords'] = df[['kw1', 'kw2', 'kw3']].values.tolist()