Pandas, adding multiple columns of list-CodePudding

I have a dataframe like this one

df = pd.DataFrame({'A' : [['a', 'b', 'c'], ['e', 'f', 'g','g']], 'B' : [['1', '4', 'a'], ['5', 'a']]})

I would like to create another column C that will be a column of list like the others but this one will be the "union" of the others Something like this :

df = pd.DataFrame({'A' : [['a', 'b', 'c'], ['e', 'f', 'g', 'g']], 'B' : [['1', '4', 'a'], ['5', 'a']], 'C' : [['a', 'b', 'c', '1', '4', 'a'], ['e', 'f', 'g', 'g', '5', 'a']]})

But i have like hundreds of columns and C will be the "union" of these hundreds of columns i dont want to index on it like this :

df['C'] = df['A']   df['B]

And i dont want to make a for loop because the dataframe i am manipulating are too big and i want something fast

Thank you for helping

CodePudding user response：

As you have lists, you cannot vectorize the operation.

A list comprehension might be the fastest:

from itertools import chain
df['out'] = [list(chain.from_iterable(x[1:])) for x in df.itertuples()]

Example:

              A          B       C                       out
0     [a, b, c]  [1, 4, a]  [x, y]  [a, b, c, 1, 4, a, x, y]
1  [e, f, g, g]     [5, a]     [z]     [e, f, g, g, 5, a, z]

CodePudding user response：

As an alternative to @mozway 's answer, you could try something like this:

df = pd.DataFrame({'A': [['a', 'b', 'c'], ['e', 'f', 'g','g']], 'B' : [['1', '4', 'a'], ['5', 'a']]})
df['C'] = df.sum(axis=1).astype(str)

use 'astype' as required for list contents

CodePudding user response：

you can use the apply method

df['C']=df.apply(lambda x: [' '.join(i) for i in list(x[df.columns.to_list()])], axis=1)