Delete brackets from column values-CodePudding

I have the following dataframe:

df = pd.DataFrame({'column1': ['Severe weather Not Severe weather kind of severe weather]})

I tokenized this dataframe:

from nltk.tokenize import word_tokenize
df['column1'] = df['column1'].apply(lambda x: word_tokenize(x))

The output is enclosed inside brackets:

column1
0   [Severe, weather, Not, Severe, weather, kind, of, severe, weather]

I want the have the output without brackets:

column1
0 Severe, weather, Not, Severe, weather, kind, of, severe, weather

What I have tried:

def delete_brackets(x):
    for i in x:
        if i == '[' or i == ']':
            x.remove(i)
    return x
df=delete_brackets(df)

and

def remove_brackets(x):
    return x.replace('[', '').replace(']', '')
df=remove_brackets(df)

Still getting the output inside brackets

Any ideas? Thanks

CodePudding user response：

You can use

df['column1'] = df['column1'].apply(lambda x: ", ".join(map(str, word_tokenize(x))))

Output:

>>> print(df.to_string())
                                                            column1
0  Severe, weather, Not, Severe, weather, kind, of, severe, weather

The word_tokenize() function returns a list of tokens that you need to cast to str (this is done with map(str, word_tokenize(x))) and then you can join the strings with a comma and space.

CodePudding user response：

you missed .str

df['column1'] = df['column1'].str.replace("[","", regex=True).replace("]","", regex=True)

CodePudding user response：

Can also lstrip/rstrip:

df['column1'] = df['column1'].str.lstrip('[').str.rstrip(']')