I have the following dataframe:
df = pd.DataFrame({'column1': ['Severe weather Not Severe weather kind of severe weather]})
I tokenized this dataframe:
from nltk.tokenize import word_tokenize
df['column1'] = df['column1'].apply(lambda x: word_tokenize(x))
The output is enclosed inside brackets:
column1
0 [Severe, weather, Not, Severe, weather, kind, of, severe, weather]
I want the have the output without brackets:
column1
0 Severe, weather, Not, Severe, weather, kind, of, severe, weather
What I have tried:
def delete_brackets(x):
for i in x:
if i == '[' or i == ']':
x.remove(i)
return x
df=delete_brackets(df)
and
def remove_brackets(x):
return x.replace('[', '').replace(']', '')
df=remove_brackets(df)
Still getting the output inside brackets
Any ideas? Thanks
CodePudding user response:
You can use
df['column1'] = df['column1'].apply(lambda x: ", ".join(map(str, word_tokenize(x))))
Output:
>>> print(df.to_string())
column1
0 Severe, weather, Not, Severe, weather, kind, of, severe, weather
The word_tokenize()
function returns a list of tokens that you need to cast to str
(this is done with map(str, word_tokenize(x))
) and then you can join the strings with a comma and space.
CodePudding user response:
you missed .str
df['column1'] = df['column1'].str.replace("[","", regex=True).replace("]","", regex=True)
CodePudding user response:
Can also lstrip
/rstrip
:
df['column1'] = df['column1'].str.lstrip('[').str.rstrip(']')