Python/Pandas how to combine all values in a single column into one list-CodePudding

For example you have a column like this below:

Column1
adfghb, gad
234rwfa
ballbalba
9adfad9, 5432
99a

Expected output:

list1 = ["adfghb", "gad", "234rwfa", "ballbalba", "9adfad9", "5432", "99a"]

Datatype in the column is only string. I need efficient code since actual column is quite huge. I used for loop, but takes way too long.

CodePudding user response：

You can use str methods outside of Pandas:

>>> ', '.join(df['Column1']).split(', ')
['adfghb', 'gad', '234rwfa', 'ballbalba', '9adfad9', '5432', '99a']

Performance

For 25,000 rows:

# @MayankPorwal
%timeit df['Column1'].str.split(', ').explode().tolist()
9.99 ms ± 85.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# @jezrael
%timeit [y for x in df['Column1'] for y in x.split(', ')]
4.25 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# @Corralien
%timeit  ', '.join(df['Column1']).split(', ')
2.24 ms ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

CodePudding user response：

Use Series.str.split with Series.explode:

In [1044]: l = df['Column1'].str.split(', ').explode().tolist()

In [1045]: l
Out[1045]: ['adfghb', 'gad', '234rwfa', 'ballbalba', '9adfad9', '5432', '99a']