Some context, I have some data that I'm doing some text analysis on, I have just tokenized them and I want to combine all the lists in the dataframe column for some further processing.
My df is as:
df = pd.DataFrame({'title': ['issue regarding app', 'graphics should be better'], 'text': [["'app'", "'load'", "'slowly'"], ["'interface'", "'need'", "'to'", "'look'", "'nicer'"]]})`
I want to merge all the lists in the 'text' column into one list, and also remove the open/close inverted commas.
Something like this:
lst = ['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']`
Thank you for all your help!
CodePudding user response:
You can accomplish that with the use of apply
and lambda
The use of apply
method is
to apply a function to each element
in the 'text'
column while
the sum
function is to
concatenate all the lists together
lst = sum(df["text"].apply(lambda x: [i.replace("'", "") for i in x]), [])
Output:
['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']
If you want to replace multiple elements like "'“
and "a"
, translate
will be efficient instead of replace
:
trans = str.maketrans("", "", "'a")
lst = sum(df["text"].apply(lambda x: [i.translate(trans) for i in x]), [])
CodePudding user response:
Use a simple list comprehension:
out = [x.strip("'") for l in df['text'] for x in l]
Output:
['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']
CodePudding user response:
We can also iterate through each list in the series and concatenate them using append() and finally use concat() to convert them to a list. Yields the same output as above.