How do I combine lists in column of dataframe to a single list-CodePudding

Some context, I have some data that I'm doing some text analysis on, I have just tokenized them and I want to combine all the lists in the dataframe column for some further processing.

My df is as:

df = pd.DataFrame({'title': ['issue regarding app', 'graphics should be better'], 'text': [["'app'", "'load'", "'slowly'"], ["'interface'", "'need'", "'to'", "'look'", "'nicer'"]]})`

I want to merge all the lists in the 'text' column into one list, and also remove the open/close inverted commas.

Something like this:

lst = ['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']`

Thank you for all your help!

CodePudding user response：

You can accomplish that with the use of apply and lambda

The use of apply method is to apply a function to each element in the 'text' column while the sum function is to concatenate all the lists together

lst = sum(df["text"].apply(lambda x: [i.replace("'", "") for i in x]), [])

Output:

['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']

If you want to replace multiple elements like "'“ and "a", translate will be efficient instead of replace:

trans = str.maketrans("", "", "'a")
lst = sum(df["text"].apply(lambda x: [i.translate(trans) for i in x]), [])

CodePudding user response：

Use a simple list comprehension:

out = [x.strip("'") for l in df['text'] for x in l]

Output:

['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']

CodePudding user response：

We can also iterate through each list in the series and concatenate them using append() and finally use concat() to convert them to a list. Yields the same output as above.