I have a dataframe like this one
df = pd.DataFrame({'A' : [['a', 'b', 'c'], ['e', 'f', 'g','g']], 'B' : [['1', '4', 'a'], ['5', 'a']]})
I would like to create another column C that will be a column of list like the others but this one will be the "union" of the others Something like this :
df = pd.DataFrame({'A' : [['a', 'b', 'c'], ['e', 'f', 'g', 'g']], 'B' : [['1', '4', 'a'], ['5', 'a']], 'C' : [['a', 'b', 'c', '1', '4', 'a'], ['e', 'f', 'g', 'g', '5', 'a']]})
But i have like hundreds of columns and C will be the "union" of these hundreds of columns i dont want to index on it like this :
df['C'] = df['A'] df['B]
And i dont want to make a for loop because the dataframe i am manipulating are too big and i want something fast
Thank you for helping
CodePudding user response:
As you have lists, you cannot vectorize the operation.
A list comprehension might be the fastest:
from itertools import chain
df['out'] = [list(chain.from_iterable(x[1:])) for x in df.itertuples()]
Example:
A B C out
0 [a, b, c] [1, 4, a] [x, y] [a, b, c, 1, 4, a, x, y]
1 [e, f, g, g] [5, a] [z] [e, f, g, g, 5, a, z]
CodePudding user response:
As an alternative to @mozway 's answer, you could try something like this:
df = pd.DataFrame({'A': [['a', 'b', 'c'], ['e', 'f', 'g','g']], 'B' : [['1', '4', 'a'], ['5', 'a']]})
df['C'] = df.sum(axis=1).astype(str)
use 'astype' as required for list contents
CodePudding user response:
you can use the apply method
df['C']=df.apply(lambda x: [' '.join(i) for i in list(x[df.columns.to_list()])], axis=1)